[00:23:49] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://phabricator.wikimedia.org/T174916 [00:24:51] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 0.078 second response time https://phabricator.wikimedia.org/T174916 [00:30:59] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://phabricator.wikimedia.org/T174916 [01:00:55] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 4.979 second response time https://phabricator.wikimedia.org/T174916 [01:04:37] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://phabricator.wikimedia.org/T174916 [01:20:15] PROBLEM - HTTP availability for Nginx -SSL terminators- at esams on icinga2001 is CRITICAL: cluster=cache_text site=esams https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [01:26:15] RECOVERY - HTTP availability for Nginx -SSL terminators- at esams on icinga2001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [01:59:47] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 0.900 second response time https://phabricator.wikimedia.org/T174916 [02:03:33] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://phabricator.wikimedia.org/T174916 [02:30:07] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 9.283 second response time https://phabricator.wikimedia.org/T174916 [02:33:43] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://phabricator.wikimedia.org/T174916 [02:39:39] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 5.265 second response time https://phabricator.wikimedia.org/T174916 [02:45:45] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://phabricator.wikimedia.org/T174916 [02:51:35] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 1.257 second response time https://phabricator.wikimedia.org/T174916 [02:55:19] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://phabricator.wikimedia.org/T174916 [02:57:39] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 6.180 second response time https://phabricator.wikimedia.org/T174916 [03:01:17] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://phabricator.wikimedia.org/T174916 [03:12:01] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 1.619 second response time https://phabricator.wikimedia.org/T174916 [03:15:49] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://phabricator.wikimedia.org/T174916 [03:18:06] (03CR) 10BryanDavis: [C: 03+1] php72: Switch from thirdparty/php72 to component/php72 [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/495291 (https://phabricator.wikimedia.org/T216712) (owner: 10Legoktm) [03:26:50] !log kartik@deploy1001 Started deploy [cxserver/deploy@101bebd]: Update cxserver to 5a26308 (T216044, T217878) [03:26:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:26:55] T216044: CX2: Article considered as a whole translation unit, ignoring paragraphs and failing translation - https://phabricator.wikimedia.org/T216044 [03:26:56] T217878: CX2: Templates are not adapted after source page inpect mode integration - https://phabricator.wikimedia.org/T217878 [03:30:13] (03PS7) 10Mathew.onipe: elasticsearch: refactor elastic icinga checks [puppet] - 10https://gerrit.wikimedia.org/r/494499 (https://phabricator.wikimedia.org/T214921) [03:30:51] !log kartik@deploy1001 Finished deploy [cxserver/deploy@101bebd]: Update cxserver to 5a26308 (T216044, T217878) (duration: 04m 01s) [03:30:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:31:36] (03CR) 10Mathew.onipe: elasticsearch: refactor elastic icinga checks (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/494499 (https://phabricator.wikimedia.org/T214921) (owner: 10Mathew.onipe) [03:33:49] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 6.346 second response time https://phabricator.wikimedia.org/T174916 [03:37:27] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://phabricator.wikimedia.org/T174916 [03:46:29] 10Operations, 10MobileFrontend, 10TechCom, 10Traffic, 10Readers-Web-Backlog (Tracking): Remove .m. subdomain, serve mobile and desktop variants through the same URL - https://phabricator.wikimedia.org/T214998 (10Force_Radical) Could we sort of default to the last site domain the user has visited ? I mean... [03:49:27] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 276 bytes in 7.826 second response time https://phabricator.wikimedia.org/T174916 [03:53:05] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://phabricator.wikimedia.org/T174916 [04:00:04] kart_: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190311T0400). [04:00:09] Yep [04:00:59] !log Started manual run of unpublished ContentTranslation draft purge script (T217818) [04:01:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:01:12] T217818: Run unpublished draft purge script for CX (Week of 03/10) - https://phabricator.wikimedia.org/T217818 [04:03:43] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 1.292 second response time https://phabricator.wikimedia.org/T174916 [04:07:27] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://phabricator.wikimedia.org/T174916 [04:13:29] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 7.716 second response time https://phabricator.wikimedia.org/T174916 [04:17:11] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://phabricator.wikimedia.org/T174916 [04:20:43] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 3.720 second response time https://phabricator.wikimedia.org/T174916 [04:24:25] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://phabricator.wikimedia.org/T174916 [04:29:42] (03PS4) 10BryanDavis: wmcs: Add profiles for oidentd proxy and client modes [puppet] - 10https://gerrit.wikimedia.org/r/493767 (https://phabricator.wikimedia.org/T151704) [05:05:05] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 6.072 second response time https://phabricator.wikimedia.org/T174916 [05:08:47] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://phabricator.wikimedia.org/T174916 [05:13:35] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 3.578 second response time https://phabricator.wikimedia.org/T174916 [05:18:31] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://phabricator.wikimedia.org/T174916 [05:19:39] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 5.039 second response time https://phabricator.wikimedia.org/T174916 [05:25:45] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://phabricator.wikimedia.org/T174916 [05:36:31] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 276 bytes in 6.245 second response time https://phabricator.wikimedia.org/T174916 [05:40:09] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://phabricator.wikimedia.org/T174916 [05:48:25] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 0.721 second response time https://phabricator.wikimedia.org/T174916 [05:52:09] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://phabricator.wikimedia.org/T174916 [05:53:11] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 1.766 second response time https://phabricator.wikimedia.org/T174916 [05:56:53] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://phabricator.wikimedia.org/T174916 [06:02:36] !log Upgrade MySQL on dbstore1004 (s2, s3, s4) [06:02:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:02:59] !log Restarting pdfrender on scb1003 [06:03:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:03:55] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 0.074 second response time https://phabricator.wikimedia.org/T174916 [06:29:05] (03PS1) 10Marostegui: db-eqiad.php: Depool db1099 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/495635 [06:30:08] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Depool db1099 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/495635 (owner: 10Marostegui) [06:31:08] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1099 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/495635 (owner: 10Marostegui) [06:32:48] 10Operations, 10Wikimedia-SVG-rendering, 10Upstream: Update librsvg to ≥2.42.3 - https://phabricator.wikimedia.org/T193352 (10jijiki) [06:34:58] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1099 for upgrade (duration: 03m 01s) [06:34:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:35:01] !log Upgrade mysql and kernel on db1099 [06:35:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:35:05] mw1280.eqiad.wmnet down? [06:36:25] 10Operations, 10Wikimedia-SVG-rendering, 10Upstream: Update librsvg to ≥2.42.3 - https://phabricator.wikimedia.org/T193352 (10jijiki) [06:36:28] 10Operations, 10Wikimedia-SVG-rendering, 10Upstream: librsvg misinterpret quoted font family names that contain whitespaces - https://phabricator.wikimedia.org/T64987 (10jijiki) [06:36:46] 10Operations, 10Thumbor, 10Wikimedia-SVG-rendering, 10Upstream: librsvg misinterpret quoted font family names that contain whitespaces - https://phabricator.wikimedia.org/T64987 (10jijiki) [06:37:10] !log Power cycle mw1280 - server down [06:37:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:37:15] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1099 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/495635 (owner: 10Marostegui) [06:38:20] 10Operations, 10Wikimedia-SVG-rendering, 10Upstream: Update librsvg to ≥2.42.3 - https://phabricator.wikimedia.org/T193352 (10jijiki) [06:38:41] !log Finished manual run of unpublished ContentTranslation draft purge script (T217818) [06:38:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:38:44] T217818: Run unpublished draft purge script for CX (Week of 03/10) - https://phabricator.wikimedia.org/T217818 [06:41:13] RECOVERY - Host mw1280 is UP: PING OK - Packet loss = 0%, RTA = 36.86 ms [06:45:24] (03PS1) 10Marostegui: db-eqiad.php: Slowly repool db1099 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/495636 [06:47:24] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Slowly repool db1099 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/495636 (owner: 10Marostegui) [06:48:22] (03Merged) 10jenkins-bot: db-eqiad.php: Slowly repool db1099 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/495636 (owner: 10Marostegui) [06:48:36] (03CR) 10jenkins-bot: db-eqiad.php: Slowly repool db1099 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/495636 (owner: 10Marostegui) [06:49:31] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Slowly repool db1099 after upgrade (duration: 00m 52s) [06:49:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:55:49] (03PS1) 10Marostegui: db-eqiad.php: More traffic to db1099 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/495638 [06:56:54] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: More traffic to db1099 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/495638 (owner: 10Marostegui) [06:57:51] (03Merged) 10jenkins-bot: db-eqiad.php: More traffic to db1099 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/495638 (owner: 10Marostegui) [06:58:53] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Give more traffic to db1099 after upgrade (duration: 00m 48s) [06:58:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:59:41] (03CR) 10jenkins-bot: db-eqiad.php: More traffic to db1099 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/495638 (owner: 10Marostegui) [07:03:21] 10Operations, 10serviceops: mw1280 crashed - https://phabricator.wikimedia.org/T218006 (10jijiki) p:05Triage→03Normal [07:03:41] 10Operations, 10serviceops: mw1280 crashed - https://phabricator.wikimedia.org/T218006 (10jijiki) [07:03:47] 10Operations, 10serviceops: mw1280 crashed - https://phabricator.wikimedia.org/T218006 (10Marostegui) I power cycled it from the idrac (it was totally stuck) [07:09:35] (03PS1) 10Marostegui: db-eqiad.php: Increase traffic for db1099 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/495639 [07:10:59] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Increase traffic for db1099 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/495639 (owner: 10Marostegui) [07:11:58] (03Merged) 10jenkins-bot: db-eqiad.php: Increase traffic for db1099 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/495639 (owner: 10Marostegui) [07:12:11] (03CR) 10jenkins-bot: db-eqiad.php: Increase traffic for db1099 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/495639 (owner: 10Marostegui) [07:13:06] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Give more traffic to db1099 after upgrade (duration: 00m 48s) [07:13:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:29:51] (03PS1) 10Marostegui: db-eqiad.php: Fully repool db1099 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/495640 [07:31:45] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Fully repool db1099 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/495640 (owner: 10Marostegui) [07:32:41] !log Upgrade MySQL and kernel on pc2010 (spare) [07:32:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:32:46] (03Merged) 10jenkins-bot: db-eqiad.php: Fully repool db1099 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/495640 (owner: 10Marostegui) [07:33:32] (03CR) 10jenkins-bot: db-eqiad.php: Fully repool db1099 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/495640 (owner: 10Marostegui) [07:33:51] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Fully repool db1099 after upgrade (duration: 00m 48s) [07:33:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:53:37] 10Operations, 10serviceops: mw1280 crashed - https://phabricator.wikimedia.org/T218006 (10MoritzMuehlenhoff) a:03Cmjohnson The server has broken memory (and warranty expires in a month): ` Record: 43 Date/Time: 03/10/2019 07:53:15 Source: system Severity: Critical Description: Correctable mem... [07:53:46] 10Operations, 10ops-eqiad, 10serviceops: mw1280 crashed - https://phabricator.wikimedia.org/T218006 (10MoritzMuehlenhoff) [08:01:12] (03CR) 10Muehlenhoff: "This means that we can no longer use the openldap packages from stretch-security, but in the case of openldap vulnerabilities need to roll" [puppet] - 10https://gerrit.wikimedia.org/r/495490 (https://phabricator.wikimedia.org/T217280) (owner: 10GTirloni) [08:02:46] !log Upgrade pc1010 (spare) [08:02:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:04:25] 10Operations, 10Office-IT, 10Wikimedia-Mailing-lists, 10CommRel-Specialists-Support (Jan-Mar-2019): Mailing list migration for Arbitration Committee to Google Group - https://phabricator.wikimedia.org/T215940 (10WormTT) Oh, I should mention, because i don't see it anywhere on this ticket. We transferred... [08:17:26] (03CR) 10DCausse: [C: 03+1] "thanks!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494755 (https://phabricator.wikimedia.org/T215491) (owner: 10Giuseppe Lavagetto) [08:44:17] 10Puppet, 10cloud-services-team: Puppet failure on tools-exec-1430.tools.eqiad.wmflabs - https://phabricator.wikimedia.org/T218009 (10Az1568) [08:57:02] 10Puppet, 10cloud-services-team: Puppet failure on tools-exec-1430.tools.eqiad.wmflabs - https://phabricator.wikimedia.org/T218009 (10Az1568) Got another email stating that puppet is now failing on tools-sgeexec-0905.tools.eqiad.wmflabs as well, so updating phab title. [08:57:15] 10Puppet, 10cloud-services-team: Puppet failure on tools-exec-1430.tools.eqiad.wmflabs and tools-sgeexec-0905.tools.eqiad.wmflabs - https://phabricator.wikimedia.org/T218009 (10Az1568) [09:07:05] (03PS1) 10Gilles: Normalize thumbnail URLs to avoid cachebusting [puppet] - 10https://gerrit.wikimedia.org/r/495643 (https://phabricator.wikimedia.org/T216339) [09:07:42] 10Operations, 10Proton, 10Core Platform Team Backlog (Watching / External), 10Reading-Infrastructure-Team-Backlog (Kanban), 10Services (watching): Proton fails with Chromium 72.0.3626.96 - https://phabricator.wikimedia.org/T216493 (10MoritzMuehlenhoff) It seems this was caused by an upstream regression i... [09:14:06] (03CR) 10Gilles: "@Ema is there a way to run the vtc tests in CI?" [puppet] - 10https://gerrit.wikimedia.org/r/495643 (https://phabricator.wikimedia.org/T216339) (owner: 10Gilles) [09:18:36] (03Abandoned) 10Jcrespo: parsercache: Reduce retention time to 7 days due to running out of space [puppet] - 10https://gerrit.wikimedia.org/r/466036 (owner: 10Jcrespo) [09:21:38] (03PS10) 10Jcrespo: mariadb-backup: Deploy new snapshot cycle to cumin and provisioning hosts [puppet] - 10https://gerrit.wikimedia.org/r/494899 (https://phabricator.wikimedia.org/T210292) [09:22:31] (03CR) 10jerkins-bot: [V: 04-1] mariadb-backup: Deploy new snapshot cycle to cumin and provisioning hosts [puppet] - 10https://gerrit.wikimedia.org/r/494899 (https://phabricator.wikimedia.org/T210292) (owner: 10Jcrespo) [09:24:32] (03PS2) 10Jcrespo: Revert "nrpe: Don't set PrivateTmp=True" [puppet] - 10https://gerrit.wikimedia.org/r/464601 [09:25:05] (03CR) 10Volans: [C: 03+1] "LGTM" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/494765 (https://phabricator.wikimedia.org/T217646) (owner: 10Jbond) [09:25:23] (03CR) 10Jcrespo: "@alex I am not sure if my rebase was correct, was user/group removed for a reason?" [puppet] - 10https://gerrit.wikimedia.org/r/464601 (owner: 10Jcrespo) [09:27:57] (03CR) 10Jcrespo: "I think it was on purpose based on cab6d7101eaa93d37adcc271a4d69059d921747c" [puppet] - 10https://gerrit.wikimedia.org/r/464601 (owner: 10Jcrespo) [09:28:45] (03PS3) 10Jcrespo: Revert "nrpe: Don't set PrivateTmp=True" [puppet] - 10https://gerrit.wikimedia.org/r/464601 [09:30:27] (03PS1) 10Joal: Update AQS druid datasource to 2019_02 [puppet] - 10https://gerrit.wikimedia.org/r/495646 [09:30:37] elukey: --^ [09:31:38] (03CR) 10Jcrespo: "Please see how or if you want to deploy this." [puppet] - 10https://gerrit.wikimedia.org/r/464601 (owner: 10Jcrespo) [09:31:42] (03CR) 10Elukey: [C: 03+2] Update AQS druid datasource to 2019_02 [puppet] - 10https://gerrit.wikimedia.org/r/495646 (owner: 10Joal) [09:33:07] (03PS2) 10Jcrespo: mariadb: Remove conditional for system user [puppet] - 10https://gerrit.wikimedia.org/r/461035 (https://phabricator.wikimedia.org/T100501) [09:37:37] (03CR) 10Marostegui: "Will this affect labsdb1004 and labsdb1005? What is the status of those? Can those be decommissioned if yes?" [puppet] - 10https://gerrit.wikimedia.org/r/461035 (https://phabricator.wikimedia.org/T100501) (owner: 10Jcrespo) [09:40:21] (03CR) 10Volans: [C: 04-1] "Nice! (but there is a typo ;) ) Looks good to me otherwise. A couple of optional improvements inline." (033 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/494976 (owner: 10Jbond) [09:44:03] !log roll restart of aqs on aqs100* to pick up new druid settings [09:44:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:44:23] !log installing chromium security updates on proton1001 [09:44:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:49:03] (03CR) 10Volans: [C: 03+1] "LGTM, make sure to run the compiler also on the ganeti hosts to ensure it's a noop." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/493348 (https://phabricator.wikimedia.org/T215229) (owner: 10CRusnov) [09:55:15] (03CR) 10Arturo Borrero Gonzalez: [C: 04-1] "Sorry I think perhaps I introduced too many comments not strictly related to your patch." [puppet] - 10https://gerrit.wikimedia.org/r/490197 (https://phabricator.wikimedia.org/T210818) (owner: 10GTirloni) [09:56:13] !log installing chromium security updates on remaining proton hosts [09:56:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:59:04] 10Operations, 10ops-eqiad, 10decommission: Decommission conf100[1-3] - https://phabricator.wikimedia.org/T206626 (10MoritzMuehlenhoff) @RobH conf1001-conf1003 are still visible in debmonitor/puppetdb [10:01:45] 10Operations, 10ops-codfw, 10Traffic, 10decommission: Decommission acamar and achernar - https://phabricator.wikimedia.org/T198286 (10MoritzMuehlenhoff) @RobH I'd like to use one of the hosts for some installer tests in the next weeks, can we hold decommissioning these for a few weeks? [10:07:49] (03CR) 10Volans: [C: 03+1] "Also this requires that the private repo (both labs-private and the puppetmaster one) are updated. I'd suggest to do it in two commits, fi" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/493348 (https://phabricator.wikimedia.org/T215229) (owner: 10CRusnov) [10:13:45] 10Operations, 10Proton, 10Core Platform Team Backlog (Watching / External), 10Reading-Infrastructure-Team-Backlog (Kanban), 10Services (watching): Proton fails with Chromium 72.0.3626.96 - https://phabricator.wikimedia.org/T216493 (10MoritzMuehlenhoff) 05Open→03Resolved The Chromium update has been r... [10:20:15] (03CR) 10Volans: [C: 04-1] "RAPI uses a self-signed certificate, see inline" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/493348 (https://phabricator.wikimedia.org/T215229) (owner: 10CRusnov) [10:21:20] (03PS1) 10Jdrewniak: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/495650 (https://phabricator.wikimedia.org/T128546) [10:30:04] jan_drewniak: How many deployers does it take to do Wikimedia Portals Update deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190311T1030). [10:34:36] (03CR) 10Volans: [C: 04-1] "Some incongruences with the config file from the puppet patch and some exception handling missing, but is mostly there." (0310 comments) [software/netbox-deploy] - 10https://gerrit.wikimedia.org/r/492007 (https://phabricator.wikimedia.org/T215229) (owner: 10CRusnov) [10:36:24] (03PS1) 10Marostegui: db-eqiad.php: Depool db1097 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/495663 [10:37:38] (03CR) 10Jdrewniak: [C: 03+2] Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/495650 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak) [10:38:16] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good (with the labsdb1004/1005 caveat)" [puppet] - 10https://gerrit.wikimedia.org/r/461035 (https://phabricator.wikimedia.org/T100501) (owner: 10Jcrespo) [10:38:44] (03Merged) 10jenkins-bot: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/495650 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak) [10:41:58] jan_drewniak: Can I deploy real quick a change on wmf-config once you've synced? [10:42:18] @marostegui yup I'll just be sec [10:42:27] thanks! [10:42:39] jan_drewniak: let me know when I can go ahead :) [10:43:58] !log jdrewniak@deploy1001 Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:495650| Bumping portals to master (T128546)]] (duration: 00m 49s) [10:44:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:44:04] T128546: [Recurring Task] Update Wikipedia and sister projects portals statistics - https://phabricator.wikimedia.org/T128546 [10:44:15] 10Puppet, 10cloud-services-team: Puppet failure on tools-exec-1430.tools.eqiad.wmflabs and tools-sgeexec-0905.tools.eqiad.wmflabs - https://phabricator.wikimedia.org/T218009 (10Ineuw) I also received two automated emails, the numbers are different but I am sure that they are related to this bug. tools-sgeexec... [10:44:48] !log jdrewniak@deploy1001 Synchronized portals: Wikimedia Portals Update: [[gerrit:495650| Bumping portals to master (T128546)]] (duration: 00m 49s) [10:44:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:45:04] (03CR) 10jenkins-bot: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/495650 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak) [10:46:36] @marostegui ok I'm done now :) [10:46:42] jan_drewniak: thanks! [10:46:46] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Depool db1097 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/495663 (owner: 10Marostegui) [10:47:45] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1097 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/495663 (owner: 10Marostegui) [10:48:52] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1097 for upgrade and schema change (duration: 00m 48s) [10:48:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:50:25] (03CR) 10Santhosh: [C: 03+1] Enable ExternalGuidance to all Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/493672 (https://phabricator.wikimedia.org/T216129) (owner: 10KartikMistry) [10:51:34] (03PS8) 10Mathew.onipe: elasticsearch: refactor elastic icinga checks [puppet] - 10https://gerrit.wikimedia.org/r/494499 (https://phabricator.wikimedia.org/T214921) [10:52:42] 10Operations, 10Continuous-Integration-Infrastructure, 10Packaging: Upgrade jenkins-debian-glue to v0.20.0 - https://phabricator.wikimedia.org/T212774 (10MoritzMuehlenhoff) >>! In T212774#4897016, @hashar wrote: > * the source package is a Debian native package, seems to qualify for the `main` component. We... [10:53:49] 10Operations, 10SRE-Access-Requests: Grant root on MediaWiki maintenance hosts to perf-roots - https://phabricator.wikimedia.org/T217813 (10Vgutierrez) a:03kchapman [10:56:30] (03PS9) 10Mathew.onipe: elasticsearch: refactor elastic icinga checks [puppet] - 10https://gerrit.wikimedia.org/r/494499 (https://phabricator.wikimedia.org/T214921) [10:56:39] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1097 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/495663 (owner: 10Marostegui) [10:58:24] (03PS3) 10KartikMistry: Enable ExternalGuidance to all Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/493672 (https://phabricator.wikimedia.org/T216129) [10:58:30] (03CR) 10Muehlenhoff: Add config file and exclude_mounts options to debdeploy (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/494764 (https://phabricator.wikimedia.org/T217646) (owner: 10Jbond) [11:00:04] addshore, hashar, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for European Mid-day SWAT(Max 6 patches). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190311T1100). [11:00:04] No GERRIT patches in the queue for this window AFAICS. [11:00:41] wait, it's now? [11:01:04] * zeljkof mumbles something about time-zones and daylight savings time [11:01:21] No, I need SWAT [11:01:43] Jayprakash12345: please add to the calendar, I'll get ready [11:01:51] Added [11:03:17] (03PS3) 10Tim Eulitz: Set up exceptions for rollback confirmation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494270 (https://phabricator.wikimedia.org/T217436) [11:03:29] (03CR) 10jerkins-bot: [V: 04-1] Set up exceptions for rollback confirmation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494270 (https://phabricator.wikimedia.org/T217436) (owner: 10Tim Eulitz) [11:05:45] zeljkof: I am ready to test the patch, please go ahead to merge the patch. [11:06:01] zeljkof: DST, I guess. [11:06:29] kart_: yes, and google calendar events being pinned to US/EU timezone, wrongly :/ [11:06:39] yeah. [11:06:46] Jayprakash12345: just a second, some timing problems :/ [11:07:22] kart_: want to practice swat skills? ;) [11:07:40] (03PS10) 10Mathew.onipe: elasticsearch: refactor elastic icinga checks [puppet] - 10https://gerrit.wikimedia.org/r/494499 (https://phabricator.wikimedia.org/T214921) [11:07:52] (03PS1) 10Cparle: Fix syntax for MediaInfo depicts config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/495666 (https://phabricator.wikimedia.org/T217157) [11:07:56] zeljkof: Sorry, need to rush for some reallife stuff :/ [11:08:08] have fun :) [11:08:48] (03CR) 10jerkins-bot: [V: 04-1] Fix syntax for MediaInfo depicts config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/495666 (https://phabricator.wikimedia.org/T217157) (owner: 10Cparle) [11:08:55] (03PS4) 10Tim Eulitz: Set up exceptions for rollback confirmation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494270 (https://phabricator.wikimedia.org/T217436) [11:10:40] Jayprakash12345: I see some opposition to the patch in phabricator, I don't want to deploy the patch since people are opposed to it [11:10:46] (03PS2) 10Cparle: Fix syntax for MediaInfo depicts config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/495666 (https://phabricator.wikimedia.org/T217157) [11:11:02] please reach consensus in phabricator on what needs to be done [11:12:20] (03PS11) 10Mathew.onipe: elasticsearch: refactor elastic icinga checks [puppet] - 10https://gerrit.wikimedia.org/r/494499 (https://phabricator.wikimedia.org/T214921) [11:13:04] There is no opposition. They just want see other opitions [11:14:08] I cleared all the doubt [11:14:09] (03CR) 10Zfilipin: "This was scheduled for EU SWAT today, but I didn't deploy it since there is some opposition to it in the Phabricator ticket. Please reach " [mediawiki-config] - 10https://gerrit.wikimedia.org/r/495446 (https://phabricator.wikimedia.org/T217486) (owner: 10Jayprakash12345) [11:15:05] Ping @Reedy [11:15:28] Jayprakash12345: sorry, I'm not sure what's going on, and that's a red flag for deployment for me [11:15:59] Jayprakash12345: please get at least one +1 to the patch from people in the phab ticket [11:16:10] (03CR) 10Mathew.onipe: "PCC output is Ok: https://puppet-compiler.wmflabs.org/compiler1002/15052/" [puppet] - 10https://gerrit.wikimedia.org/r/494499 (https://phabricator.wikimedia.org/T214921) (owner: 10Mathew.onipe) [11:17:49] 10Operations, 10Continuous-Integration-Infrastructure, 10Packaging: Upgrade jenkins-debian-glue to v0.20.0 - https://phabricator.wikimedia.org/T212774 (10hashar) On Jessie we had it in `jessie-wikimedia/main`, which predates the introduction of `component/xxx` in our apt repositories. So we have: ` $ apt-ca... [11:21:44] (03PS8) 10Jbond: Cookbook to reset ipmi passwords [cookbooks] - 10https://gerrit.wikimedia.org/r/494976 [11:22:50] 10Operations, 10Security, 10Surveys: Re-evaluate Limesurvey - https://phabricator.wikimedia.org/T109606 (10Aklapper) 05Stalled→03Declined [11:25:27] (03PS1) 10Tim Eulitz: Add default user config for rollback confirmation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/495667 (https://phabricator.wikimedia.org/T217436) [11:28:19] (03CR) 10Jbond: "> Patch Set 7: Code-Review-1" (033 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/494976 (owner: 10Jbond) [11:31:48] (03PS1) 10GTirloni: profile::base::labs - Ability to disable Puppet failure emails [puppet] - 10https://gerrit.wikimedia.org/r/495670 (https://phabricator.wikimedia.org/T218009) [11:34:38] 10Operations, 10Cloud-VPS, 10Toolforge, 10LDAP, 10Patch-For-Review: LDAP server running out of memory frequently and disrupting Cloud VPS clients - https://phabricator.wikimedia.org/T217280 (10GTirloni) >>! In T217280#5013940, @Stashbot wrote: > {nav icon=file, name=Mentioned in SAL (#wikimedia-operation... [11:36:35] (03CR) 10Muehlenhoff: Update wmf-auto-restarts to read exclude mounts from debdeploy config (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/494765 (https://phabricator.wikimedia.org/T217646) (owner: 10Jbond) [11:49:35] 10Puppet, 10cloud-services-team, 10Patch-For-Review: Puppet failure on tools-exec-1430.tools.eqiad.wmflabs and tools-sgeexec-0905.tools.eqiad.wmflabs - https://phabricator.wikimedia.org/T218009 (10GTirloni) [[ https://github.com/wikimedia/puppet/blob/production/modules/base/files/labs/notify_maintainers.py#L... [11:49:44] (03CR) 10Volans: "> not sure you saw this comment" (032 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/494976 (owner: 10Jbond) [11:51:19] (03PS5) 10Jbond: Add config file and exclude_mounts options to debdeploy [puppet] - 10https://gerrit.wikimedia.org/r/494764 (https://phabricator.wikimedia.org/T217646) [11:52:25] (03CR) 10Jbond: Add config file and exclude_mounts options to debdeploy (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/494764 (https://phabricator.wikimedia.org/T217646) (owner: 10Jbond) [11:58:45] (03PS6) 10Jbond: Add config file and exclude_mounts options to debdeploy [puppet] - 10https://gerrit.wikimedia.org/r/494764 (https://phabricator.wikimedia.org/T217646) [12:02:45] (03PS4) 10Jbond: Update wmf-auto-restarts to read exclude mounts from debdeploy config [puppet] - 10https://gerrit.wikimedia.org/r/494765 (https://phabricator.wikimedia.org/T217646) [12:03:17] 10Operations, 10MobileFrontend, 10TechCom, 10Traffic, 10Readers-Web-Backlog (Tracking): Remove .m. subdomain, serve mobile and desktop variants through the same URL - https://phabricator.wikimedia.org/T214998 (10ovasileva) >>! In T214998#5010890, @dr0ptp4kt wrote: > ^ Well, I intended for that to be on e... [12:04:32] (03CR) 10Jbond: Update wmf-auto-restarts to read exclude mounts from debdeploy config (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/494765 (https://phabricator.wikimedia.org/T217646) (owner: 10Jbond) [12:05:47] !log updating slapd on serpens/codfw to test possible fix for memory leaks [12:05:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:09:44] 10Operations, 10ops-codfw, 10cloud-services-team (Kanban): Hardware decommission: labtestcontrol2001.wikimedia.org - https://phabricator.wikimedia.org/T218021 (10aborrero) [12:11:49] PROBLEM - Labs LDAP on serpens is CRITICAL: Could not bind to the LDAP server [12:11:59] PROBLEM - toolschecker: Test LDAP for query on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/ldap - 237 bytes in 0.250 second response time https://wikitech.wikimedia.org/wiki/Help:Toolforge/Monitoring [12:12:10] 10Operations, 10ops-codfw, 10decommission, 10cloud-services-team (Kanban): Hardware decommission: labtestcontrol2001.wikimedia.org - https://phabricator.wikimedia.org/T218021 (10aborrero) [12:12:29] known I guess, gtirloni ? [12:12:32] gtirloni: expected? [12:12:35] mmm [12:12:40] I'm looking [12:13:07] RECOVERY - Labs LDAP on serpens is OK: LDAP OK - 0.008 seconds response time [12:13:18] RECOVERY - toolschecker: Test LDAP for query on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.410 second response time https://wikitech.wikimedia.org/wiki/Help:Toolforge/Monitoring [12:13:37] oh, so LDAp on serpens is CRITICAL seems to be the original issue [12:15:20] arturo: gtirloni was updating slapd, see SAL above [12:15:48] I see, thanks volans [12:18:02] 10Operations, 10ops-codfw, 10decommission, 10cloud-services-team (Kanban): Hardware decommission: labtestservices2001.wikimedia.org - https://phabricator.wikimedia.org/T218022 (10aborrero) [12:20:50] (03CR) 10Volans: "LGTM. I'll leave it up to you in case you want to add an 'if' based on the distro name for now before all trusty are gone" [puppet] - 10https://gerrit.wikimedia.org/r/494681 (https://phabricator.wikimedia.org/T213527) (owner: 10Muehlenhoff) [12:21:48] 10Operations, 10ops-codfw, 10decommission, 10cloud-services-team (Kanban): Hardware decommission: labtestvirt200[12].codfw.wmnet - https://phabricator.wikimedia.org/T218023 (10aborrero) [12:23:27] !log updated slapd to version 2.4.47 on serpens (T217280) [12:23:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:23:30] T217280: LDAP server running out of memory frequently and disrupting Cloud VPS clients - https://phabricator.wikimedia.org/T217280 [12:25:23] (03CR) 10Volans: Add LimitCORE support for uwsgi units. (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/493294 (owner: 10CRusnov) [12:28:41] (03CR) 10GTirloni: "> Patch Set 2:" [puppet] - 10https://gerrit.wikimedia.org/r/495490 (https://phabricator.wikimedia.org/T217280) (owner: 10GTirloni) [12:34:22] 10Operations, 10ops-codfw, 10decommission, 10cloud-services-team (Kanban): Hardware decommmision: labtestweb2001.wikimedia.org - https://phabricator.wikimedia.org/T218024 (10aborrero) [12:35:59] (03CR) 10Muehlenhoff: [C: 03+1] Add LimitCORE support for uwsgi units. (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/493294 (owner: 10CRusnov) [12:39:02] 10Operations, 10ops-codfw, 10decommission, 10cloud-services-team (Kanban): Hardware decommission: cloudnet2001-dev.codfw.wmnet - https://phabricator.wikimedia.org/T218025 (10aborrero) [12:43:31] (03CR) 10Muehlenhoff: [C: 03+1] Update wmf-auto-restarts to read exclude mounts from debdeploy config [puppet] - 10https://gerrit.wikimedia.org/r/494765 (https://phabricator.wikimedia.org/T217646) (owner: 10Jbond) [12:44:41] (03CR) 10Volans: Add LimitCORE support for uwsgi units. (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/493294 (owner: 10CRusnov) [12:45:41] (03PS9) 10Jbond: Cookbook to reset ipmi passwords [cookbooks] - 10https://gerrit.wikimedia.org/r/494976 [12:46:40] (03CR) 10Volans: [C: 04-1] "I might miss some more recent context here, but didn't we decided to use a proxy here?" (035 comments) [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/495267 (owner: 10CRusnov) [12:46:59] (03CR) 10Matthias Mullie: [C: 03+1] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/495666 (https://phabricator.wikimedia.org/T217157) (owner: 10Cparle) [12:47:20] (03CR) 10Jbond: "I added an additional log entry as i think it is nicer to have it in an easy to copy paste format" (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/494976 (owner: 10Jbond) [12:51:30] (03CR) 10Volans: [C: 03+1] "Nice! LGTM, just one nit inline (documentation's fault ;) )" (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/494976 (owner: 10Jbond) [12:55:20] (03PS7) 10Jbond: Add config file and exclude_mounts options to debdeploy [puppet] - 10https://gerrit.wikimedia.org/r/494764 (https://phabricator.wikimedia.org/T217646) [12:58:09] (03PS1) 10Petar.petkovic: Add publish restrictions config for enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/495677 (https://phabricator.wikimedia.org/T217237) [12:59:25] (03CR) 10Petar.petkovic: Add publish restrictions config for enwiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/495677 (https://phabricator.wikimedia.org/T217237) (owner: 10Petar.petkovic) [12:59:27] (03PS10) 10Jbond: Cookbook to reset ipmi passwords [cookbooks] - 10https://gerrit.wikimedia.org/r/494976 [12:59:36] (03CR) 10Jbond: Cookbook to reset ipmi passwords (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/494976 (owner: 10Jbond) [13:04:50] !log upgrading mwdebug2002 to php 7.2.16 [13:04:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:04:53] 10Operations, 10ops-codfw, 10decommission, 10cloud-services-team (Kanban): Hardware decommission: labtestvirt200[12].codfw.wmnet - https://phabricator.wikimedia.org/T218023 (10aborrero) p:05Triage→03Normal [13:05:43] 10Operations, 10ops-codfw, 10decommission, 10cloud-services-team (Kanban): Hardware decommission: labtestvirt200[12].codfw.wmnet - https://phabricator.wikimedia.org/T218023 (10aborrero) [13:06:00] (03CR) 10Volans: [C: 03+1] "LGTM! To merge it just +2 it (no submit), CI will run and auto-submit it. To deploy just wait next puppet run on the cumin[12]001 hosts or" [cookbooks] - 10https://gerrit.wikimedia.org/r/494976 (owner: 10Jbond) [13:06:57] (03CR) 10Jbond: [C: 03+2] Cookbook to reset ipmi passwords [cookbooks] - 10https://gerrit.wikimedia.org/r/494976 (owner: 10Jbond) [13:07:10] 10Operations, 10ops-codfw, 10decommission, 10cloud-services-team (Kanban): Hardware decommission: labtestcontrol2001.wikimedia.org - https://phabricator.wikimedia.org/T218021 (10aborrero) [13:07:19] 10Operations, 10ops-codfw, 10decommission, 10cloud-services-team (Kanban): Hardware decommission: labtestservices2001.wikimedia.org - https://phabricator.wikimedia.org/T218022 (10aborrero) [13:07:32] 10Operations, 10ops-codfw, 10decommission, 10cloud-services-team (Kanban): Hardware decommmision: labtestweb2001.wikimedia.org - https://phabricator.wikimedia.org/T218024 (10aborrero) [13:07:43] 10Operations, 10ops-codfw, 10decommission, 10cloud-services-team (Kanban): Hardware decommission: cloudnet2001-dev.codfw.wmnet - https://phabricator.wikimedia.org/T218025 (10aborrero) [13:15:38] (03PS1) 10Jbond: Add component/ci wikimedia repository to CI hosts [puppet] - 10https://gerrit.wikimedia.org/r/495681 (https://phabricator.wikimedia.org/T212774) [13:19:57] 10Operations, 10Continuous-Integration-Infrastructure, 10Packaging, 10Patch-For-Review: Upgrade jenkins-debian-glue to v0.20.0 - https://phabricator.wikimedia.org/T212774 (10jbond) i have built this and added it to `{jessie,stretch}-wikimedia` in `component/ci`. It appears that the CI servers did not hav... [13:22:43] 10Puppet, 10cloud-services-team, 10Patch-For-Review: Puppet failure on tools-exec-1430.tools.eqiad.wmflabs and tools-sgeexec-0905.tools.eqiad.wmflabs - https://phabricator.wikimedia.org/T218009 (10Chicocvenancio) I was a bit confused as well not being aware of T210432 and @GTirloni fix there. I continue bei... [13:27:08] 10Operations, 10ops-codfw, 10decommission, 10cloud-services-team (Kanban): Hardware decommission: labtestvirt200[12].codfw.wmnet - https://phabricator.wikimedia.org/T218023 (10aborrero) [13:28:02] !log disable active checks in icinga for labtestvirt200[12] (T218023) [13:28:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:28:05] T218023: Hardware decommission: labtestvirt200[12].codfw.wmnet - https://phabricator.wikimedia.org/T218023 [13:29:23] 10Operations, 10ops-codfw, 10decommission, 10cloud-services-team (Kanban): Hardware decommission: labtestvirt200[12].codfw.wmnet - https://phabricator.wikimedia.org/T218023 (10aborrero) [13:36:42] (03CR) 10KartikMistry: Add publish restrictions config for enwiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/495677 (https://phabricator.wikimedia.org/T217237) (owner: 10Petar.petkovic) [13:58:04] !log Upgrade mysql on db1097 [13:58:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:07:37] (03PS1) 10Marostegui: db-eqiad.php: Slowly repool db1097 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/495692 [14:08:55] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Slowly repool db1097 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/495692 (owner: 10Marostegui) [14:09:04] !log importing build of PHP 7.2.16 for component/php72 (T216712) [14:09:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:09:07] T216712: Switch PHP 7.2 packages to an internal component - https://phabricator.wikimedia.org/T216712 [14:09:56] (03Merged) 10jenkins-bot: db-eqiad.php: Slowly repool db1097 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/495692 (owner: 10Marostegui) [14:10:18] (03CR) 10jenkins-bot: db-eqiad.php: Slowly repool db1097 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/495692 (owner: 10Marostegui) [14:10:28] (03PS1) 10Mathew.onipe: prometheus::alerts: mjolnir bulk update lag [puppet] - 10https://gerrit.wikimedia.org/r/495693 (https://phabricator.wikimedia.org/T214494) [14:10:53] (03CR) 10Muehlenhoff: [C: 03+1] Add component/ci wikimedia repository to CI hosts [puppet] - 10https://gerrit.wikimedia.org/r/495681 (https://phabricator.wikimedia.org/T212774) (owner: 10Jbond) [14:11:40] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Slowly repool db1097 (duration: 00m 47s) [14:11:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:11:41] (03CR) 10Muehlenhoff: [C: 03+1] Add config file and exclude_mounts options to debdeploy [puppet] - 10https://gerrit.wikimedia.org/r/494764 (https://phabricator.wikimedia.org/T217646) (owner: 10Jbond) [14:14:14] PROBLEM - Disk space on contint1001 is CRITICAL: DISK CRITICAL - free space: / 2513 MB (5% inode=55%) [14:20:26] hashar^ [14:20:36] bah [14:20:40] thank you [14:21:33] !log upgrading mwdebug servers to PHP 7.2.16 [14:21:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:22:42] (03CR) 10Jbond: [C: 03+2] Add config file and exclude_mounts options to debdeploy [puppet] - 10https://gerrit.wikimedia.org/r/494764 (https://phabricator.wikimedia.org/T217646) (owner: 10Jbond) [14:22:52] (03PS8) 10Jbond: Add config file and exclude_mounts options to debdeploy [puppet] - 10https://gerrit.wikimedia.org/r/494764 (https://phabricator.wikimedia.org/T217646) [14:25:32] !log contint1001: stopping zuul-merger (it is cpu or IO starving the server) [14:25:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:27:23] (03PS5) 10Jbond: Update wmf-auto-restarts to read exclude mounts from debdeploy config [puppet] - 10https://gerrit.wikimedia.org/r/494765 (https://phabricator.wikimedia.org/T217646) [14:27:58] (03PS1) 10Marostegui: db-eqiad.php: Give more traffic to db1097 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/495695 [14:29:07] (03CR) 10Marostegui: "Can you give me some hints on how to test this?" [puppet] - 10https://gerrit.wikimedia.org/r/494899 (https://phabricator.wikimedia.org/T210292) (owner: 10Jcrespo) [14:30:02] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Give more traffic to db1097 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/495695 (owner: 10Marostegui) [14:30:28] (03CR) 10Jbond: [C: 03+2] Update wmf-auto-restarts to read exclude mounts from debdeploy config [puppet] - 10https://gerrit.wikimedia.org/r/494765 (https://phabricator.wikimedia.org/T217646) (owner: 10Jbond) [14:31:07] (03Merged) 10jenkins-bot: db-eqiad.php: Give more traffic to db1097 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/495695 (owner: 10Marostegui) [14:32:09] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: More traffic to db1097 (duration: 00m 48s) [14:32:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:32:16] (03CR) 10Jcrespo: "> Can you give me some hints on how to test this?" [puppet] - 10https://gerrit.wikimedia.org/r/494899 (https://phabricator.wikimedia.org/T210292) (owner: 10Jcrespo) [14:32:30] PROBLEM - zuul_merger_service_running on contint1001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/share/python/zuul/bin/python /usr/bin/zuul-merger [14:33:26] (03CR) 10jenkins-bot: db-eqiad.php: Give more traffic to db1097 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/495695 (owner: 10Marostegui) [14:36:08] RECOVERY - zuul_merger_service_running on contint1001 is OK: PROCS OK: 1 process with regex args ^/usr/share/python/zuul/bin/python /usr/bin/zuul-merger [14:43:39] !log upgrading mw canaries to PHP 7.2.16 [14:43:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:48:51] (03PS1) 10Marostegui: db-eqiad.php: More weight to db1097 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/495696 [14:52:30] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: More weight to db1097 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/495696 (owner: 10Marostegui) [14:53:27] (03Merged) 10jenkins-bot: db-eqiad.php: More weight to db1097 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/495696 (owner: 10Marostegui) [14:55:12] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: More traffic to db1097 (duration: 00m 49s) [14:55:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:55:47] (03CR) 10jenkins-bot: db-eqiad.php: More weight to db1097 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/495696 (owner: 10Marostegui) [14:57:05] (03CR) 10Marostegui: "is daily_snapshot.py meant to be run manually or should we use backup_mariadb.py in case we want to do a one time s2 snapshot?" [puppet] - 10https://gerrit.wikimedia.org/r/494899 (https://phabricator.wikimedia.org/T210292) (owner: 10Jcrespo) [15:07:04] (03CR) 10Jcrespo: "> is daily_snapshot.py meant to be run manually or should we use" [puppet] - 10https://gerrit.wikimedia.org/r/494899 (https://phabricator.wikimedia.org/T210292) (owner: 10Jcrespo) [15:07:54] (03PS1) 10Marostegui: db-eqiad.php: Fully repool db1097 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/495697 [15:08:06] (03CR) 10Jcrespo: "> > is daily_snapshot.py meant to be run manually or should we use" [puppet] - 10https://gerrit.wikimedia.org/r/494899 (https://phabricator.wikimedia.org/T210292) (owner: 10Jcrespo) [15:08:24] (03PS3) 10Matthias Mullie: Fix syntax for MediaInfo depicts config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/495666 (https://phabricator.wikimedia.org/T217157) (owner: 10Cparle) [15:08:33] (03CR) 10Jcrespo: "> > > is daily_snapshot.py meant to be run manually or should we use" [puppet] - 10https://gerrit.wikimedia.org/r/494899 (https://phabricator.wikimedia.org/T210292) (owner: 10Jcrespo) [15:09:16] (03CR) 10Marostegui: "> > is daily_snapshot.py meant to be run manually or should we use" [puppet] - 10https://gerrit.wikimedia.org/r/494899 (https://phabricator.wikimedia.org/T210292) (owner: 10Jcrespo) [15:11:48] 10Puppet, 10cloud-services-team, 10Patch-For-Review: Puppet failure on tools-exec-1430.tools.eqiad.wmflabs and tools-sgeexec-0905.tools.eqiad.wmflabs - https://phabricator.wikimedia.org/T218009 (10bd808) Emailing to all members of the project is not desirable for at least the bastion and tools projects. In p... [15:11:53] (03CR) 10Matthias Mullie: [C: 03+2] Fix syntax for MediaInfo depicts config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/495666 (https://phabricator.wikimedia.org/T217157) (owner: 10Cparle) [15:12:55] (03Merged) 10jenkins-bot: Fix syntax for MediaInfo depicts config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/495666 (https://phabricator.wikimedia.org/T217157) (owner: 10Cparle) [15:16:36] !log mlitn@deploy1001 Synchronized wmf-config/InitialiseSettings-labs.php: Fix syntax for MediaInfo depicts config (beta only) (duration: 00m 49s) [15:16:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:18:40] (03CR) 10jenkins-bot: Fix syntax for MediaInfo depicts config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/495666 (https://phabricator.wikimedia.org/T217157) (owner: 10Cparle) [15:21:06] (03CR) 10DCausse: [C: 04-1] prometheus::alerts: mjolnir bulk update lag (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/495693 (https://phabricator.wikimedia.org/T214494) (owner: 10Mathew.onipe) [15:22:28] 10Operations, 10Analytics, 10RESTBase, 10Traffic, and 2 others: Verify that hit/miss stats in WebRequest are correct - https://phabricator.wikimedia.org/T215987 (10Pchelolo) 05Open→03Resolved a:03Pchelolo Thank you, everyone! :) [15:24:15] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Fully repool db1097 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/495697 (owner: 10Marostegui) [15:25:17] (03Merged) 10jenkins-bot: db-eqiad.php: Fully repool db1097 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/495697 (owner: 10Marostegui) [15:26:19] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Fully repool db1097 (duration: 00m 48s) [15:26:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:26:40] 10Operations, 10MobileFrontend, 10TechCom, 10Traffic, 10Readers-Web-Backlog (Tracking): Remove .m. subdomain, serve mobile and desktop variants through the same URL - https://phabricator.wikimedia.org/T214998 (10dr0ptp4kt) Thanks @ovasileva. [15:28:24] (03CR) 10Jcrespo: "Wait a bit, I am uploading a new version soon, it had large bugs." [puppet] - 10https://gerrit.wikimedia.org/r/494899 (https://phabricator.wikimedia.org/T210292) (owner: 10Jcrespo) [15:29:47] (03CR) 10jenkins-bot: db-eqiad.php: Fully repool db1097 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/495697 (owner: 10Marostegui) [15:33:48] (03CR) 10Vgutierrez: "the script in https://phabricator.wikimedia.org/P8176 can be used to migrate from the old directories schema to the new one" [software/acme-chief] - 10https://gerrit.wikimedia.org/r/494956 (https://phabricator.wikimedia.org/T207295) (owner: 10Vgutierrez) [15:37:47] (03PS2) 10Mathew.onipe: prometheus::alerts: mjolnir bulk update lag [puppet] - 10https://gerrit.wikimedia.org/r/495693 (https://phabricator.wikimedia.org/T214494) [15:38:23] (03CR) 10Mathew.onipe: prometheus::alerts: mjolnir bulk update lag (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/495693 (https://phabricator.wikimedia.org/T214494) (owner: 10Mathew.onipe) [15:38:35] (03PS11) 10Jcrespo: mariadb-backup: Deploy new snapshot cycle to cumin and provisioning hosts [puppet] - 10https://gerrit.wikimedia.org/r/494899 (https://phabricator.wikimedia.org/T210292) [15:39:27] (03CR) 10jerkins-bot: [V: 04-1] mariadb-backup: Deploy new snapshot cycle to cumin and provisioning hosts [puppet] - 10https://gerrit.wikimedia.org/r/494899 (https://phabricator.wikimedia.org/T210292) (owner: 10Jcrespo) [15:39:50] (03CR) 10WMDE-Fisch: [C: 04-1] Set up exceptions for rollback confirmation (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494270 (https://phabricator.wikimedia.org/T217436) (owner: 10Tim Eulitz) [15:48:46] (03CR) 10DCausse: "it lacks the lag check which can be extracted from this dashboard: https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?orgId=1 ci" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/495693 (https://phabricator.wikimedia.org/T214494) (owner: 10Mathew.onipe) [15:50:27] (03CR) 10Filippo Giunchedi: [C: 03+1] Stop using transitional package names for Icinga plugins [puppet] - 10https://gerrit.wikimedia.org/r/494681 (https://phabricator.wikimedia.org/T213527) (owner: 10Muehlenhoff) [15:50:40] (03PS12) 10Jcrespo: mariadb-backup: Deploy new snapshot cycle to cumin and provisioning hosts [puppet] - 10https://gerrit.wikimedia.org/r/494899 (https://phabricator.wikimedia.org/T210292) [15:51:31] (03CR) 10jerkins-bot: [V: 04-1] mariadb-backup: Deploy new snapshot cycle to cumin and provisioning hosts [puppet] - 10https://gerrit.wikimedia.org/r/494899 (https://phabricator.wikimedia.org/T210292) (owner: 10Jcrespo) [16:05:55] (03CR) 10CRusnov: "> Patch Set 11: Code-Review-1" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/493348 (https://phabricator.wikimedia.org/T215229) (owner: 10CRusnov) [16:21:50] (03PS1) 10Jbond: Use new excluded_mounts config option [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/495705 [16:28:33] (03PS1) 10Muehlenhoff: Add option to debdeploy server to also show automated restarts [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/495707 [16:28:35] (03PS1) 10Muehlenhoff: Update debian/changelog for new deb build/release [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/495708 [16:31:23] !log jbond@cumin1001 START - Cookbook sre.hosts.ipmi-password-reset [16:31:23] !log jbond@cumin1001 END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=99) [16:31:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:31:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:32:19] (03PS3) 10Mathew.onipe: prometheus::alerts: mjolnir bulk update lag [puppet] - 10https://gerrit.wikimedia.org/r/495693 (https://phabricator.wikimedia.org/T214494) [16:51:23] (03CR) 10Jbond: Add option to debdeploy server to also show automated restarts (033 comments) [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/495707 (owner: 10Muehlenhoff) [16:52:38] (03CR) 10Jbond: Use new excluded_mounts config option (031 comment) [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/495705 (owner: 10Jbond) [17:00:04] gehel and onimisionipe: Dear deployers, time to do the Wikidata Query Service weekly deploy deploy. Dont look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190311T1700). [17:00:52] here here [17:02:11] (03CR) 10Muehlenhoff: Add option to debdeploy server to also show automated restarts (033 comments) [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/495707 (owner: 10Muehlenhoff) [17:02:27] (03PS2) 10Muehlenhoff: Add option to debdeploy server to also show automated restarts [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/495707 [17:02:58] !log jbond@cumin1001 START - Cookbook sre.hosts.ipmi-password-reset [17:02:58] !log jbond@cumin1001 END (FAIL) - Cookbook sre.hosts.ipmi-password-reset (exit_code=99) [17:02:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:03:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:03:40] jbond42: I'm around if needed ;) (ref ^^^) [17:04:14] 10Operations: Netbox: fill network topology - https://phabricator.wikimedia.org/T205897 (10Volans) I've had a chat with @ayounsi about this. While I agree what all that @faidon said in T205897#4953769, I actually came to a different conclusion. The problem is that `port 1` doesn't mean anything in a reliable wa... [17:05:52] volans: thanks, just testing there are a few things i need to add like dont log hear with --dry-run :) [17:06:37] it should be already the case [17:06:47] irc_logger is not setup when dry-run is set [17:06:50] ahhhh sorry my bad [17:07:33] no, I thought you had a dry-run option in the script [17:07:39] volans: in fact the errors above where without dry run, however i have not put any dry_run decoration in my cookbook [17:07:48] but no, as you shouldn't given that the dry-run is in the cookbooks options itself [17:08:29] cookbook -d sre.hosts.ipmi-password-reset OPTIONS [17:08:44] dry-run and verbose are managed by the cookbook script itself [17:08:54] and propagated to the Spicerack instance passed to the cookbooks [17:09:00] so you get them for free ;) [17:09:14] see 'cookbook -h' on a cumin host [17:10:12] it's quickly mentioned in https://doc.wikimedia.org/spicerack/master/introduction.html#spicerack-library [17:10:33] jbond42: ^^^ [17:10:55] this is the diff i have https://phabricator.wikimedia.org/P8177 [17:11:03] are you saying i dont need the if statment? [17:11:07] no need [17:11:46] jbond42: https://github.com/wikimedia/operations-software-spicerack/blob/master/spicerack/log.py#L115 [17:13:01] ok cool, thanks, will still need a small update as https://phabricator.wikimedia.org/P8177$7 is wrong [17:13:47] also do you normally just let cookbooks stacktrace or is the intention to try and catch exceptions and handle locally? [17:13:47] (03PS1) 10CRusnov: Add certs for RAPI and adjust config to use them [puppet] - 10https://gerrit.wikimedia.org/r/495717 [17:13:55] volans: ^^ [17:14:25] (03CR) 10jerkins-bot: [V: 04-1] Add certs for RAPI and adjust config to use them [puppet] - 10https://gerrit.wikimedia.org/r/495717 (owner: 10CRusnov) [17:15:04] (03CR) 10Jbond: [C: 03+1] "LGTM" (031 comment) [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/495707 (owner: 10Muehlenhoff) [17:15:43] jbond42: ack for the bugfix, sorry I missed that. As for the exception handling, the cookbook execution will catching any exception and consider it a failure. So a cookbook can perfectly just raise [17:16:06] if you return None (no return) or 0 it's considered a success [17:16:19] (03PS2) 10CRusnov: Add certs for RAPI and adjust config to use them [puppet] - 10https://gerrit.wikimedia.org/r/495717 [17:16:33] any other return value must be an int <128 and will be used as exit code, but there are some reserved (90-99) [17:16:40] see https://doc.wikimedia.org/spicerack/master/introduction.html#api-interface [17:18:11] volans: i was thinking about something like this or should i just leave it as is https://phabricator.wikimedia.org/P8178 [17:18:47] what would be the difference? [17:19:03] no stacktrace in the console [17:19:06] (03PS1) 10Arturo Borrero Gonzalez: aptrepo: create base thirdparty component in stretch-wikimedia [puppet] - 10https://gerrit.wikimedia.org/r/495718 (https://phabricator.wikimedia.org/T215605) [17:19:18] no traceback even [17:19:46] it would be recorded anyway ;) [17:19:58] ok cool ill leave it as is [17:20:00] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] aptrepo: create base thirdparty component in stretch-wikimedia [puppet] - 10https://gerrit.wikimedia.org/r/495718 (https://phabricator.wikimedia.org/T215605) (owner: 10Arturo Borrero Gonzalez) [17:20:01] see cookbook.py:407 [17:20:41] if you want to exit clean [17:20:44] :) ok [17:20:49] you have to log something and return an int [17:21:01] that is not 0 :D [17:21:07] that's up to you ofc [17:21:25] no happy to leave it thanks [17:23:03] !log T215605 copy python-oath from jessie-wikimedia/thirdparty to stretch-wikimedia/thirdpary in reprepro [17:23:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:23:06] T215605: cloudvps: missing packages in stretch for cloudcontrol servers - https://phabricator.wikimedia.org/T215605 [17:25:34] (03PS1) 10Jbond: Use spicerack.irc_logger object directly. also it's not callable [cookbooks] - 10https://gerrit.wikimedia.org/r/495719 [17:26:07] (03CR) 10Volans: [C: 03+1] "LGTM" [cookbooks] - 10https://gerrit.wikimedia.org/r/495719 (owner: 10Jbond) [17:27:17] (03CR) 10DCausse: [C: 04-1] prometheus::alerts: mjolnir bulk update lag (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/495693 (https://phabricator.wikimedia.org/T214494) (owner: 10Mathew.onipe) [17:27:35] (03CR) 10Jbond: [C: 03+2] Use spicerack.irc_logger object directly. also it's not callable [cookbooks] - 10https://gerrit.wikimedia.org/r/495719 (owner: 10Jbond) [17:39:32] 10Operations: Netbox: fill network topology - https://phabricator.wikimedia.org/T205897 (10ayounsi) [17:39:41] (03PS1) 10Addshore: BETA, commons wikidata foreign repo, use d for IW prefix [mediawiki-config] - 10https://gerrit.wikimedia.org/r/495721 [17:40:05] (03CR) 10Muehlenhoff: "No, this is intentional, from stretch onwards those are split up properly. Please revert (and wait for reviews instead of self-merging rep" [puppet] - 10https://gerrit.wikimedia.org/r/495718 (https://phabricator.wikimedia.org/T215605) (owner: 10Arturo Borrero Gonzalez) [17:40:06] jouncebot: now [17:40:07] No deployments scheduled for the next 0 hour(s) and 19 minute(s) [17:40:10] jouncebot: next [17:40:10] In 0 hour(s) and 19 minute(s): Morning SWAT (Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190311T1800) [17:40:27] o/ If noone objects I'm going to squeeze a beta only patch out before swat [17:41:17] (03CR) 10Addshore: [C: 03+2] BETA, commons wikidata foreign repo, use d for IW prefix [mediawiki-config] - 10https://gerrit.wikimedia.org/r/495721 (owner: 10Addshore) [17:42:17] (03Merged) 10jenkins-bot: BETA, commons wikidata foreign repo, use d for IW prefix [mediawiki-config] - 10https://gerrit.wikimedia.org/r/495721 (owner: 10Addshore) [17:43:59] !log addshore@deploy1001 Synchronized wmf-config/InitialiseSettings-labs.php: BETA ONLY https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/495721/ (duration: 00m 49s) [17:44:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:44:04] thats me out of the way [17:44:19] 10Operations: Netbox: fill network topology - https://phabricator.wikimedia.org/T205897 (10ayounsi) All eqsin links are now in Netbox. [17:47:22] (03CR) 10jenkins-bot: BETA, commons wikidata foreign repo, use d for IW prefix [mediawiki-config] - 10https://gerrit.wikimedia.org/r/495721 (owner: 10Addshore) [17:51:19] * addshore will have one more [17:52:21] (03PS1) 10Addshore: BETA, fix d: wikidata target (no www.) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/495723 [17:52:36] (03CR) 10Addshore: [C: 03+2] BETA, fix d: wikidata target (no www.) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/495723 (owner: 10Addshore) [17:53:36] (03Merged) 10jenkins-bot: BETA, fix d: wikidata target (no www.) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/495723 (owner: 10Addshore) [17:53:38] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] "> No, this is intentional, from stretch onwards those are split up" [puppet] - 10https://gerrit.wikimedia.org/r/495718 (https://phabricator.wikimedia.org/T215605) (owner: 10Arturo Borrero Gonzalez) [17:54:48] (03PS1) 10Arturo Borrero Gonzalez: Revert "aptrepo: create base thirdparty component in stretch-wikimedia" [puppet] - 10https://gerrit.wikimedia.org/r/495724 [17:55:42] !log addshore@deploy1001 Synchronized wmf-config/interwiki-labs.php: BETA ONLY https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/495723/ (duration: 00m 48s) [17:55:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:55:48] okay, all clear for swat :) [17:56:35] (03PS12) 10CRusnov: Add configuration for the ganeti->netbox sync. [puppet] - 10https://gerrit.wikimedia.org/r/493348 (https://phabricator.wikimedia.org/T215229) [17:56:54] (03CR) 10CRusnov: Add configuration for the ganeti->netbox sync. (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/493348 (https://phabricator.wikimedia.org/T215229) (owner: 10CRusnov) [17:57:10] (03CR) 10jerkins-bot: [V: 04-1] Add configuration for the ganeti->netbox sync. [puppet] - 10https://gerrit.wikimedia.org/r/493348 (https://phabricator.wikimedia.org/T215229) (owner: 10CRusnov) [17:58:36] (03CR) 10jenkins-bot: BETA, fix d: wikidata target (no www.) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/495723 (owner: 10Addshore) [17:59:00] (03PS13) 10CRusnov: Add configuration for the ganeti->netbox sync. [puppet] - 10https://gerrit.wikimedia.org/r/493348 (https://phabricator.wikimedia.org/T215229) [18:00:05] Deploy window Morning SWAT (Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190311T1800) [18:00:05] No GERRIT patches in the queue for this window AFAICS. [18:00:20] (03PS1) 10Arturo Borrero Gonzalez: aptrepo: rename thirdparty component in stretch-wikimedia to be more concrete [puppet] - 10https://gerrit.wikimedia.org/r/495727 (https://phabricator.wikimedia.org/T215605) [18:00:58] (03CR) 10jerkins-bot: [V: 04-1] aptrepo: rename thirdparty component in stretch-wikimedia to be more concrete [puppet] - 10https://gerrit.wikimedia.org/r/495727 (https://phabricator.wikimedia.org/T215605) (owner: 10Arturo Borrero Gonzalez) [18:02:42] (03PS2) 10Arturo Borrero Gonzalez: aptrepo: rename thirdparty component in stretch-wikimedia to be more concrete [puppet] - 10https://gerrit.wikimedia.org/r/495727 (https://phabricator.wikimedia.org/T215605) [18:03:01] (03Abandoned) 10Arturo Borrero Gonzalez: Revert "aptrepo: create base thirdparty component in stretch-wikimedia" [puppet] - 10https://gerrit.wikimedia.org/r/495724 (owner: 10Arturo Borrero Gonzalez) [18:04:16] 10Operations, 10ops-eqiad, 10decommission: Decommission conf100[1-3] - https://phabricator.wikimedia.org/T206626 (10RobH) >>! In T206626#5014723, @MoritzMuehlenhoff wrote: > @RobH conf1001-conf1003 are still visible in debmonitor/puppetdb Indeed, I screwed up and didn't run the decom script, but checked off... [18:06:26] (03CR) 10Muehlenhoff: "> Could you please point me to any docs in which such policy is documented?" [puppet] - 10https://gerrit.wikimedia.org/r/495718 (https://phabricator.wikimedia.org/T215605) (owner: 10Arturo Borrero Gonzalez) [18:13:18] 10Operations, 10Analytics, 10Analytics-Kanban, 10Wikimedia-Stream, and 2 others: Eventstreams build is broken - https://phabricator.wikimedia.org/T216184 (10Nuria) 05Open→03Resolved [18:13:53] 10Operations, 10Analytics, 10Analytics-Kanban, 10User-Elukey: Review znodes on Zookeeper cluster to possibly remove not-used data - https://phabricator.wikimedia.org/T216979 (10Nuria) 05Open→03Resolved [18:17:57] (03PS15) 10CRusnov: Add ganeti->netbox sync script [software/netbox-deploy] - 10https://gerrit.wikimedia.org/r/492007 (https://phabricator.wikimedia.org/T215229) [18:18:08] (03CR) 10CRusnov: Add ganeti->netbox sync script (0310 comments) [software/netbox-deploy] - 10https://gerrit.wikimedia.org/r/492007 (https://phabricator.wikimedia.org/T215229) (owner: 10CRusnov) [18:20:18] (03PS1) 10GTirloni: cloudvps: Do not send Puppet failure emails for tools/bastion projects [puppet] - 10https://gerrit.wikimedia.org/r/495729 (https://phabricator.wikimedia.org/T218009) [18:27:23] (03CR) 10Volans: [C: 03+1] "Change looks sane, one question inline." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/495717 (owner: 10CRusnov) [18:28:05] (03PS4) 10Mathew.onipe: prometheus::alerts: mjolnir bulk update lag [puppet] - 10https://gerrit.wikimedia.org/r/495693 (https://phabricator.wikimedia.org/T214494) [18:28:48] (03CR) 10Mathew.onipe: prometheus::alerts: mjolnir bulk update lag (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/495693 (https://phabricator.wikimedia.org/T214494) (owner: 10Mathew.onipe) [18:30:03] (03CR) 10BryanDavis: [C: 03+1] cloudvps: Do not send Puppet failure emails for tools/bastion projects [puppet] - 10https://gerrit.wikimedia.org/r/495729 (https://phabricator.wikimedia.org/T218009) (owner: 10GTirloni) [18:30:30] (03CR) 10Volans: Add configuration for the ganeti->netbox sync. (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/493348 (https://phabricator.wikimedia.org/T215229) (owner: 10CRusnov) [18:32:06] (03PS2) 10GTirloni: cloudvps: Do not send Puppet failure emails for tools/bastion projects [puppet] - 10https://gerrit.wikimedia.org/r/495729 (https://phabricator.wikimedia.org/T218009) [18:33:19] (03CR) 10GTirloni: [C: 03+2] cloudvps: Do not send Puppet failure emails for tools/bastion projects [puppet] - 10https://gerrit.wikimedia.org/r/495729 (https://phabricator.wikimedia.org/T218009) (owner: 10GTirloni) [18:36:23] 10Puppet, 10cloud-services-team, 10Patch-For-Review: Puppet failure emails sent to non-admin members of tools project causing user confusion - https://phabricator.wikimedia.org/T218009 (10bd808) [18:36:31] (03PS14) 10CRusnov: Add configuration for the ganeti->netbox sync. [puppet] - 10https://gerrit.wikimedia.org/r/493348 (https://phabricator.wikimedia.org/T215229) [18:37:19] 10Operations, 10MediaWiki-extensions-WikibaseClient, 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, and 8 others: Investigate more efficient memcached solution for CacheAwarePropertyInfoStore - https://phabricator.wikimedia.org/T97368 (10elukey) I found another occurrence of timeouts in mcrouter wi... [18:38:45] (03CR) 10CRusnov: "Silly problem. Thanks :)" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/493348 (https://phabricator.wikimedia.org/T215229) (owner: 10CRusnov) [18:43:36] (03CR) 10Volans: [C: 03+1] "Looks good, double check with the compiler too please." [puppet] - 10https://gerrit.wikimedia.org/r/493348 (https://phabricator.wikimedia.org/T215229) (owner: 10CRusnov) [18:50:12] (03CR) 10Volans: [C: 04-1] "Nice! Just few minor things." (035 comments) [software/netbox-deploy] - 10https://gerrit.wikimedia.org/r/492007 (https://phabricator.wikimedia.org/T215229) (owner: 10CRusnov) [18:57:41] (03PS1) 10Elukey: Raise memcached dedicated memory on mc1019 [puppet] - 10https://gerrit.wikimedia.org/r/495731 (https://phabricator.wikimedia.org/T217731) [19:00:34] (03CR) 10Elukey: "https://puppet-compiler.wmflabs.org/compiler1002/15054/" [puppet] - 10https://gerrit.wikimedia.org/r/495731 (https://phabricator.wikimedia.org/T217731) (owner: 10Elukey) [19:01:14] (03PS2) 10Elukey: Raise memcached dedicated memory on mc1019 [puppet] - 10https://gerrit.wikimedia.org/r/495731 (https://phabricator.wikimedia.org/T217731) [19:11:55] (03PS13) 10Jcrespo: mariadb-backup: Deploy new snapshot cycle to cumin and provisioning hosts [puppet] - 10https://gerrit.wikimedia.org/r/494899 (https://phabricator.wikimedia.org/T210292) [19:12:58] (03CR) 10jerkins-bot: [V: 04-1] mariadb-backup: Deploy new snapshot cycle to cumin and provisioning hosts [puppet] - 10https://gerrit.wikimedia.org/r/494899 (https://phabricator.wikimedia.org/T210292) (owner: 10Jcrespo) [19:27:52] (03CR) 10Muehlenhoff: aptrepo: rename thirdparty component in stretch-wikimedia to be more concrete (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/495727 (https://phabricator.wikimedia.org/T215605) (owner: 10Arturo Borrero Gonzalez) [19:33:12] PROBLEM - Disk space on contint1001 is CRITICAL: DISK CRITICAL - free space: / 2512 MB (5% inode=56%) [19:34:01] (03PS1) 10MSantos: Pass flag use_nodejs10 for maps services [puppet] - 10https://gerrit.wikimedia.org/r/495735 (https://phabricator.wikimedia.org/T215523) [19:34:57] (03CR) 10jerkins-bot: [V: 04-1] Pass flag use_nodejs10 for maps services [puppet] - 10https://gerrit.wikimedia.org/r/495735 (https://phabricator.wikimedia.org/T215523) (owner: 10MSantos) [19:41:31] (03PS1) 10Jbond: Fix typo, assigning host errors to the wrong instance [cookbooks] - 10https://gerrit.wikimedia.org/r/495737 [19:42:44] PROBLEM - Apache HTTP on mw1343 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 0.073 second response time [19:42:44] PROBLEM - HHVM rendering on mw1343 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 0.073 second response time [19:43:14] PROBLEM - Nginx local proxy to apache on mw1343 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 0.151 second response time [19:43:56] RECOVERY - Apache HTTP on mw1343 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 617 bytes in 0.226 second response time [19:43:56] RECOVERY - HHVM rendering on mw1343 is OK: HTTP OK: HTTP/1.1 200 OK - 80309 bytes in 0.356 second response time [19:44:26] RECOVERY - Nginx local proxy to apache on mw1343 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 618 bytes in 0.726 second response time [19:47:15] 10Operations, 10Parsoid, 10RESTBase, 10Traffic, and 5 others: Consider stashing data-parsoid for VE - https://phabricator.wikimedia.org/T215956 (10mobrovac) >>! In T215956#5004383, @ema wrote: >>>! In T215956#4977137, @mobrovac wrote: >> it seems like we will need to add a rule to Varnish to pass on these... [20:00:04] cscott, arlolra, subbu, bearND, halfak, and Amir1: #bothumor Q:Why did functions stop calling each other? A:They had arguments. Rise for Services – Parsoid / Citoid / Mobileapps / ORES / … . (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190311T2000). [20:13:54] (03PS2) 10MSantos: Pass flag use_nodejs10 for maps services [puppet] - 10https://gerrit.wikimedia.org/r/495735 (https://phabricator.wikimedia.org/T215523) [20:14:03] 10Puppet, 10cloud-services-team, 10Patch-For-Review: Puppet failure emails sent to non-admin members of tools project causing user confusion - https://phabricator.wikimedia.org/T218009 (10Krenair) a:03Krenair I'll have a go at this [20:30:26] RECOVERY - Disk space on contint1001 is OK: DISK OK [20:40:08] (03PS1) 10Alex Monk: puppet_alert: Email projectadmins instead of members [puppet] - 10https://gerrit.wikimedia.org/r/495757 (https://phabricator.wikimedia.org/T218009) [20:40:31] (03CR) 10jerkins-bot: [V: 04-1] puppet_alert: Email projectadmins instead of members [puppet] - 10https://gerrit.wikimedia.org/r/495757 (https://phabricator.wikimedia.org/T218009) (owner: 10Alex Monk) [20:43:03] (03PS2) 10Alex Monk: puppet_alert: Email projectadmins instead of members [puppet] - 10https://gerrit.wikimedia.org/r/495757 (https://phabricator.wikimedia.org/T218009) [20:50:14] 10Operations, 10RESTBase-API, 10serviceops, 10Core Platform Team Backlog (Designing), 10Services (designing): Decide whether to keep violating OpenAPI/Swagger specification in our REST services - https://phabricator.wikimedia.org/T217881 (10mobrovac) Conceptually I agree that we should conform to the spe... [20:53:39] 10Operations, 10RESTBase-API, 10TechCom, 10serviceops, and 2 others: Decide whether to keep violating OpenAPI/Swagger specification in our REST services - https://phabricator.wikimedia.org/T217881 (10mobrovac) [21:00:04] bawolff and Reedy: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for Weekly Security deployment window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190311T2100). [21:04:06] 10Operations, 10ops-eqiad: asw2-c-eqiad fpc3 Rear QSFP+ PIC Chan# 1 flapping - https://phabricator.wikimedia.org/T218059 (10ayounsi) p:05Triage→03Normal [22:00:46] 10Operations, 10RESTBase-API, 10TechCom, 10serviceops, and 2 others: Decide whether to keep violating OpenAPI/Swagger specification in our REST services - https://phabricator.wikimedia.org/T217881 (10Pchelolo) I would say we should update our swagger to 3.0 and become standard-compatible. In 3.0 there's a... [22:09:20] (03CR) 10Volans: Fix typo, assigning host errors to the wrong instance (032 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/495737 (owner: 10Jbond) [22:26:44] (03CR) 10Hashar: Add component/ci wikimedia repository to CI hosts (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/495681 (https://phabricator.wikimedia.org/T212774) (owner: 10Jbond) [22:30:27] jouncebot: next [22:30:27] In 0 hour(s) and 29 minute(s): Evening SWAT (Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190311T2300) [22:33:20] (03PS2) 10Hashar: profile: point to real modules for specs [puppet] - 10https://gerrit.wikimedia.org/r/480957 [22:35:46] (03CR) 10Hashar: "To be checked with Giuseppe, the profile tests will no more run when a dependency change since we rely on the content of the .fixtures.yml" [puppet] - 10https://gerrit.wikimedia.org/r/480957 (owner: 10Hashar) [22:48:02] (03CR) 10Jbond: Fix typo, assigning host errors to the wrong instance (032 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/495737 (owner: 10Jbond) [22:52:08] (03CR) 10Volans: Fix typo, assigning host errors to the wrong instance (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/495737 (owner: 10Jbond) [23:00:04] addshore, hashar, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: That opportune time is upon us again. Time for a Evening SWAT (Max 6 patches) deploy. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190311T2300). [23:00:05] No GERRIT patches in the queue for this window AFAICS. [23:03:38] (03PS2) 10Jbond: Fix typo, assigning host errors to the wrong instance [cookbooks] - 10https://gerrit.wikimedia.org/r/495737 [23:04:08] (03CR) 10Jbond: Fix typo, assigning host errors to the wrong instance (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/495737 (owner: 10Jbond) [23:08:06] (03CR) 10Volans: [C: 03+1] "LGTM" [cookbooks] - 10https://gerrit.wikimedia.org/r/495737 (owner: 10Jbond) [23:11:57] (03CR) 10Jbond: [C: 03+2] Fix typo, assigning host errors to the wrong instance [cookbooks] - 10https://gerrit.wikimedia.org/r/495737 (owner: 10Jbond) [23:56:26] (03PS16) 10CRusnov: Add ganeti->netbox sync script [software/netbox-deploy] - 10https://gerrit.wikimedia.org/r/492007 (https://phabricator.wikimedia.org/T215229)