[00:33:57] PROBLEM - puppet last run on install2002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [00:53:47] PROBLEM - puppet last run on contint1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [01:03:08] RECOVERY - puppet last run on install2002 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [01:04:08] PROBLEM - Check health of redis instance on 6479 on rdb2005 is CRITICAL: CRITICAL ERROR - Redis Library - can not ping 127.0.0.1 on port 6479 [01:05:08] RECOVERY - Check health of redis instance on 6479 on rdb2005 is OK: OK: REDIS 2.8.17 on 127.0.0.1:6479 has 1 databases (db0) with 3997066 keys, up 5 minutes 3 seconds - replication_delay is 0 [01:17:07] PROBLEM - recommendation_api endpoints health on scb1002 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target with seed) timed out before a response was received [01:17:57] RECOVERY - recommendation_api endpoints health on scb1002 is OK: All endpoints are healthy [01:22:57] RECOVERY - puppet last run on contint1001 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [02:30:06] !log l10nupdate@tin scap sync-l10n completed (1.30.0-wmf.19) (duration: 07m 17s) [02:30:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:36:48] !log l10nupdate@tin ResourceLoader cache refresh completed at Mon Sep 25 02:36:48 UTC 2017 (duration 6m 42s) [02:36:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:27:48] 10Operations, 10DBA, 10Patch-For-Review: db2047 got rebooted - https://phabricator.wikimedia.org/T176573#3630077 (10Marostegui) I can see HW logs and it found this: ``` hpiLO-> show record14 status=0 status_tag=COMMAND COMPLETED Mon Sep 25 04:27:07 2017 /system1/log1/record14 Targets... [04:28:54] 10Operations, 10DBA, 10Patch-For-Review: db2047 got rebooted - https://phabricator.wikimedia.org/T176573#3630681 (10Marostegui) >>! In T176573#3630142, @Volans wrote: > @jcrespo interesting, I guess the documentation in https://wikitech.wikimedia.org/wiki/Platform-specific_documentation/HP_DL3N0#Show_system_... [04:34:29] 10Operations, 10DBA, 10Patch-For-Review: db2047 got rebooted - https://phabricator.wikimedia.org/T176573#3630682 (10Marostegui) I have checked other hosts on that same rack (C6) and there are no warnings on ILO or anything related. @Papaul can you visually check the rack to see if there is any temperature wa... [04:37:38] 10Operations, 10DBA, 10Patch-For-Review: decommission db1018 - https://phabricator.wikimedia.org/T176215#3630688 (10Marostegui) [04:52:07] RECOVERY - MariaDB Slave Lag: s2 on dbstore1001 is OK: OK slave_sql_lag Replication lag: 89960.41 seconds [05:00:59] 10Operations, 10ops-eqiad, 10DBA: Decommission db1049 - https://phabricator.wikimedia.org/T175264#3630693 (10Marostegui) a:03Cmjohnson As per Jaime comment, this is now ready to be handed over Chris. Thanks! [06:14:18] PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [1000.0] [06:14:36] Error: 503, Backend fetch failed at Mon, 25 Sep 2017 06:14:08 GMT [06:14:58] Morning [06:15:19] Getting back-end errors again this morning... [06:15:22] "Request from 88.97.96.89 via cp3040 cp3040, Varnish XID 257097762 [06:15:23] Error: 503, Backend fetch failed at Mon, 25 Sep 2017 06:14:20 GMT" [06:15:48] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [1000.0] [06:21:25] 10Operations, 10Puppet, 10User-Joe: Prepare for Puppet 4 - https://phabricator.wikimedia.org/T169548#3630748 (10Joe) [06:21:32] 10Operations, 10Puppet, 10Patch-For-Review, 10User-Joe: Switch all hosts to the future parser - https://phabricator.wikimedia.org/T171704#3630747 (10Joe) 05Open>03Resolved [06:27:57] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [06:27:57] PROBLEM - graphite.wikimedia.org on graphite1003 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 398 bytes in 0.001 second response time [06:28:27] RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [06:28:57] RECOVERY - graphite.wikimedia.org on graphite1003 is OK: HTTP OK: HTTP/1.1 200 OK - 1547 bytes in 0.007 second response time [06:32:27] PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [1000.0] [06:33:09] (03PS1) 10Niharika29: Update a message and add some more humor [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/380437 [06:33:50] (03CR) 10Niharika29: [C: 032] Update a message and add some more humor [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/380437 (owner: 10Niharika29) [06:33:57] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [1000.0] [06:33:58] (03CR) 10jerkins-bot: [V: 04-1] Update a message and add some more humor [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/380437 (owner: 10Niharika29) [06:34:39] (03CR) 10jerkins-bot: [V: 04-1] Update a message and add some more humor [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/380437 (owner: 10Niharika29) [06:35:54] (03PS2) 10Niharika29: Update a message and add some more humor [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/380437 [06:37:39] (03CR) 10Niharika29: [C: 032] Update a message and add some more humor [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/380437 (owner: 10Niharika29) [06:38:23] (03Merged) 10jenkins-bot: Update a message and add some more humor [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/380437 (owner: 10Niharika29) [06:38:58] !log installing apache security updates on logstash* [06:39:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:41:47] jouncebot: refresh [06:41:49] I refreshed my knowledge about deployments. [06:41:54] jouncebot: next [06:41:54] In 4 hour(s) and 18 minute(s): Creation of the Hindi Wikivoyage wiki (task T173013) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170925T1100) [06:41:54] T173013: Create Wikivoyage Hindi - https://phabricator.wikimedia.org/T173013 [06:42:43] * Niharika pats jouncebot [06:48:27] PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [1000.0] [06:51:14] <_joe_> uhm [07:00:37] RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [07:00:58] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [07:25:10] (03CR) 10Hashar: "That would create Stephen, then he has to be added to a group. The task states that is to take over some projects by Zhou (login zhousquar" [puppet] - 10https://gerrit.wikimedia.org/r/379851 (https://phabricator.wikimedia.org/T176518) (owner: 10Zoranzoki21) [07:26:28] (03PS19) 10Phedenskog: Make values stackable [puppet] - 10https://gerrit.wikimedia.org/r/375345 (https://phabricator.wikimedia.org/T104902) [07:28:34] (03CR) 10Phedenskog: "@Krinkle ah yes of course. I'm not sure how to get new raw data? Can you help me with that and I can fix the tests." [puppet] - 10https://gerrit.wikimedia.org/r/375345 (https://phabricator.wikimedia.org/T104902) (owner: 10Phedenskog) [07:30:24] (03PS1) 10Marostegui: db-eqiad.php: Add weight to db1055 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/380439 [07:31:36] (03PS2) 10Marostegui: db-eqiad.php: Increase weight for db1055 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/380439 [07:34:28] PROBLEM - HHVM rendering on mw1278 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 0.001 second response time [07:34:38] PROBLEM - Nginx local proxy to apache on mw1278 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 0.006 second response time [07:35:18] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Increase weight for db1055 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/380439 (owner: 10Marostegui) [07:35:37] RECOVERY - HHVM rendering on mw1278 is OK: HTTP OK: HTTP/1.1 200 OK - 80717 bytes in 0.429 second response time [07:35:38] RECOVERY - Nginx local proxy to apache on mw1278 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.046 second response time [07:37:08] !log add mw132[0,2] to live traffic (mw appservers) - traffic weights will go from 5 to 20 incrementally [07:37:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:37:30] (03Merged) 10jenkins-bot: db-eqiad.php: Increase weight for db1055 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/380439 (owner: 10Marostegui) [07:37:39] (03CR) 10jenkins-bot: db-eqiad.php: Increase weight for db1055 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/380439 (owner: 10Marostegui) [07:39:07] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Increase db1055 weight (duration: 00m 47s) [07:39:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:39:13] marostegui: o/ [07:39:31] elukey: o/! [07:42:04] 10Operations, 10DBA, 10Patch-For-Review: decommission db1018 - https://phabricator.wikimedia.org/T176215#3630783 (10Marostegui) [07:42:21] !log uploaded hhvm 3.18.2+dfsg-1+wmf5+deb9u1 for stretch-wikimedia to apt.wikimedia.org [07:42:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:42:37] 10Operations, 10ops-eqiad, 10DBA: decommission db1018 - https://phabricator.wikimedia.org/T176215#3617573 (10Marostegui) a:03Cmjohnson [07:42:46] 10Operations, 10ops-eqiad, 10Patch-For-Review, 10User-Elukey, 10User-Joe: rack and setup mw1307-1348 - https://phabricator.wikimedia.org/T165519#3630788 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by elukey on neodymium.eqiad.wmnet for hosts: ``` mw1323.eqiad.wmnet ``` The log can be foun... [07:43:13] (03CR) 10Muehlenhoff: [C: 031] admin: Add gjg to contint-admin [puppet] - 10https://gerrit.wikimedia.org/r/379932 (owner: 10Greg Grossmeier) [07:44:21] (03CR) 10Muehlenhoff: [C: 031] contint: php5.5 on permanent slaves [puppet] - 10https://gerrit.wikimedia.org/r/377529 (https://phabricator.wikimedia.org/T174972) (owner: 10Hashar) [07:47:10] (03PS6) 10Muehlenhoff: contint: php5.5 on permanent slaves [puppet] - 10https://gerrit.wikimedia.org/r/377529 (https://phabricator.wikimedia.org/T174972) (owner: 10Hashar) [07:47:52] (03CR) 10Muehlenhoff: [C: 032] contint: php5.5 on permanent slaves [puppet] - 10https://gerrit.wikimedia.org/r/377529 (https://phabricator.wikimedia.org/T174972) (owner: 10Hashar) [07:48:59] (03PS1) 10Marostegui: db-eqiad.php: Increase weight for db1101 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/380440 (https://phabricator.wikimedia.org/T176311) [07:49:49] 10Operations, 10DBA, 10Patch-For-Review: decommission db1036 - https://phabricator.wikimedia.org/T176311#3630800 (10Marostegui) [07:49:57] 10Operations, 10DBA, 10Patch-For-Review: decommission db1036 - https://phabricator.wikimedia.org/T176311#3620829 (10Marostegui) [07:51:27] 10Operations, 10DBA, 10Patch-For-Review: decommission db1036 - https://phabricator.wikimedia.org/T176311#3620829 (10Marostegui) [07:52:01] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Increase weight for db1101 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/380440 (https://phabricator.wikimedia.org/T176311) (owner: 10Marostegui) [07:54:53] (03Merged) 10jenkins-bot: db-eqiad.php: Increase weight for db1101 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/380440 (https://phabricator.wikimedia.org/T176311) (owner: 10Marostegui) [07:56:04] (03CR) 10jenkins-bot: db-eqiad.php: Increase weight for db1101 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/380440 (https://phabricator.wikimedia.org/T176311) (owner: 10Marostegui) [07:56:04] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Increase traffic for db1101 - T176311 (duration: 00m 46s) [07:56:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:56:10] T176311: decommission db1036 - https://phabricator.wikimedia.org/T176311 [07:58:51] (03PS1) 10Gilles: Enable asia-specific Navigation Timing metric [mediawiki-config] - 10https://gerrit.wikimedia.org/r/380441 (https://phabricator.wikimedia.org/T169522) [08:00:38] (03PS2) 10Gilles: Enable asia-specific Navigation Timing metric [mediawiki-config] - 10https://gerrit.wikimedia.org/r/380441 (https://phabricator.wikimedia.org/T169522) [08:01:17] PROBLEM - Apache HTTP on mw1285 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 0.001 second response time [08:01:37] PROBLEM - HHVM rendering on mw1285 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 0.001 second response time [08:02:17] RECOVERY - Apache HTTP on mw1285 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.069 second response time [08:02:37] RECOVERY - HHVM rendering on mw1285 is OK: HTTP OK: HTTP/1.1 200 OK - 80693 bytes in 1.171 second response time [08:10:47] (03PS1) 10Marostegui: db-eqiad.php: Increase weight for db1055 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/380442 [08:14:03] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Increase weight for db1055 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/380442 (owner: 10Marostegui) [08:14:27] (03Abandoned) 10Hashar: (WIP) Puppet compile an host via rspec (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/308889 (owner: 10Hashar) [08:15:36] (03Merged) 10jenkins-bot: db-eqiad.php: Increase weight for db1055 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/380442 (owner: 10Marostegui) [08:16:09] (03CR) 10jenkins-bot: db-eqiad.php: Increase weight for db1055 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/380442 (owner: 10Marostegui) [08:16:39] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Increase db1055 weight (duration: 00m 45s) [08:16:40] (03CR) 10Filippo Giunchedi: role::kafka::jumbo::broker: enable Prometheus JMX monitoring (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/377753 (https://phabricator.wikimedia.org/T175922) (owner: 10Elukey) [08:16:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:19:09] (03Abandoned) 10Hashar: swift: feature flag the proxy rewriting [puppet] - 10https://gerrit.wikimedia.org/r/348236 (owner: 10Hashar) [08:19:30] (03Abandoned) 10Hashar: test puppet syntax with future parser [puppet] - 10https://gerrit.wikimedia.org/r/359396 (owner: 10Hashar) [08:21:20] (03PS2) 10Hashar: apt: spec boiler plate [puppet] - 10https://gerrit.wikimedia.org/r/374527 [08:21:22] (03PS5) 10Hashar: apt:pin pref file must not have space [puppet] - 10https://gerrit.wikimedia.org/r/353540 [08:21:35] (03PS1) 10Marostegui: db-eqiad.php: Increase weight for db1101 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/380444 (https://phabricator.wikimedia.org/T176311) [08:21:50] (03Abandoned) 10Hashar: (WIP) compile authdns::config [puppet] - 10https://gerrit.wikimedia.org/r/343747 (owner: 10Hashar) [08:22:04] (03Abandoned) 10Hashar: tests: disable ruby output buffering [puppet] - 10https://gerrit.wikimedia.org/r/359457 (owner: 10Hashar) [08:26:38] !log mobrovac@tin Started deploy [restbase/deploy@ab24f70]: Switch mobile-sections to next-gen storage and stop storing the current-day feed in Cassandra - T169940 T176233 [08:26:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:26:47] T169940: End of September milestone: Start migration of production use cases. - https://phabricator.wikimedia.org/T169940 [08:26:47] T176233: Reduce TTL for the feed end point to 10 minutes - https://phabricator.wikimedia.org/T176233 [08:36:41] !log mobrovac@tin Finished deploy [restbase/deploy@ab24f70]: Switch mobile-sections to next-gen storage and stop storing the current-day feed in Cassandra - T169940 T176233 (duration: 10m 03s) [08:36:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:36:47] T169940: End of September milestone: Start migration of production use cases. - https://phabricator.wikimedia.org/T169940 [08:36:48] T176233: Reduce TTL for the feed end point to 10 minutes - https://phabricator.wikimedia.org/T176233 [08:49:21] !log uploaded php-luasandbox 2.0.14, hhvm-tidy 0.1.3 and php-wikidiff2 1.4.1 for stretch-wikimedia to apt.wikimedia.org [08:49:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:49:35] 10Operations, 10hardware-requests: New package builder host - https://phabricator.wikimedia.org/T176472#3630872 (10fgiunchedi) >>! In T176472#3627551, @akosiaris wrote: > IIRC we opened T130759 because slow IO had indeed cause some minor suffering on our part. If we can avoid migrating back to SATA disks easil... [08:58:13] (03PS1) 10Marostegui: db-eqiad.php: Depool db1036 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/380448 (https://phabricator.wikimedia.org/T176311) [08:59:12] (03PS2) 10Marostegui: db-eqiad.php: Depool db1036, fully repool db1101 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/380448 (https://phabricator.wikimedia.org/T176311) [08:59:33] (03PS1) 10Elukey: Introduce profile::druid::worker [puppet] - 10https://gerrit.wikimedia.org/r/380449 (https://phabricator.wikimedia.org/T176223) [09:01:03] (03PS2) 10Gilles: Expose Thumbor swift username [mediawiki-config] - 10https://gerrit.wikimedia.org/r/376043 (https://phabricator.wikimedia.org/T144479) [09:01:58] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1036, fully repool db1101 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/380448 (https://phabricator.wikimedia.org/T176311) (owner: 10Marostegui) [09:02:28] (03PS3) 10Gilles: Expose Thumbor swift username [mediawiki-config] - 10https://gerrit.wikimedia.org/r/376043 (https://phabricator.wikimedia.org/T144479) [09:03:00] (03CR) 10Filippo Giunchedi: [C: 031] icinga: initial whitelist for screen monitoring [puppet] - 10https://gerrit.wikimedia.org/r/377823 (https://phabricator.wikimedia.org/T165348) (owner: 10Dzahn) [09:03:02] (03PS3) 10Gilles: Enable asia-specific Navigation Timing metric [mediawiki-config] - 10https://gerrit.wikimedia.org/r/380441 (https://phabricator.wikimedia.org/T169522) [09:04:19] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1036, fully repool db1101 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/380448 (https://phabricator.wikimedia.org/T176311) (owner: 10Marostegui) [09:05:31] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1036 and full repool db1101 - T176311 (duration: 00m 46s) [09:05:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:05:37] T176311: decommission db1036 - https://phabricator.wikimedia.org/T176311 [09:06:06] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1036, fully repool db1101 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/380448 (https://phabricator.wikimedia.org/T176311) (owner: 10Marostegui) [09:07:41] (03PS2) 10Elukey: Introduce profile::druid::worker [puppet] - 10https://gerrit.wikimedia.org/r/380449 (https://phabricator.wikimedia.org/T176223) [09:07:56] !log Add 50GB to db1036 /srv partition - T176311 [09:08:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:09:31] 10Operations, 10DBA, 10Patch-For-Review: decommission db1036 - https://phabricator.wikimedia.org/T176311#3630928 (10Marostegui) a:03Marostegui db1036 is now fully depooled and db1101 is serving as a special slave. Let's give it a couple of days and then proceed to get rid of db1036 from all the config file... [09:11:20] (03PS1) 10Marostegui: db-eqiad.php: Increase traffic for db1055 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/380450 [09:13:56] (03PS2) 10Hashar: base: invoke fail() instead of error() [puppet] - 10https://gerrit.wikimedia.org/r/377981 [09:13:57] PROBLEM - HHVM rendering on mw1296 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:14:21] (03PS3) 10Elukey: Introduce profile::druid::worker [puppet] - 10https://gerrit.wikimedia.org/r/380449 (https://phabricator.wikimedia.org/T176223) [09:14:25] (03CR) 10Hashar: "I have added some specs that would have caught them." [puppet] - 10https://gerrit.wikimedia.org/r/377981 (owner: 10Hashar) [09:14:28] (03CR) 10jerkins-bot: [V: 04-1] base: invoke fail() instead of error() [puppet] - 10https://gerrit.wikimedia.org/r/377981 (owner: 10Hashar) [09:14:50] (03CR) 10Alexandros Kosiaris: base: invoke fail() instead of error() (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/377981 (owner: 10Hashar) [09:14:56] RECOVERY - HHVM rendering on mw1296 is OK: HTTP OK: HTTP/1.1 200 OK - 80717 bytes in 0.118 second response time [09:16:24] 10Operations, 10hardware-requests: New package builder host - https://phabricator.wikimedia.org/T176472#3630937 (10MoritzMuehlenhoff) Let's try a Ganeti VM, then. Any objections? If that turns out to be non-ideal, we can still revisit WMF4727 (and buy SSDs on top). [09:17:09] (03PS1) 10Volans: wmf-auto-reimage: improve message when IPMI fails [puppet] - 10https://gerrit.wikimedia.org/r/380452 (https://phabricator.wikimedia.org/T166300) [09:17:51] (03PS2) 10Hashar: graphite: cleanup servers.* [puppet] - 10https://gerrit.wikimedia.org/r/377414 [09:21:00] (03Abandoned) 10Marostegui: db-eqiad.php: Increase weight for db1101 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/380444 (https://phabricator.wikimedia.org/T176311) (owner: 10Marostegui) [09:21:17] (03CR) 10Alexandros Kosiaris: [C: 031] "\o/" [puppet] - 10https://gerrit.wikimedia.org/r/380304 (owner: 10Giuseppe Lavagetto) [09:23:08] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Increase traffic for db1055 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/380450 (owner: 10Marostegui) [09:25:28] (03Merged) 10jenkins-bot: db-eqiad.php: Increase traffic for db1055 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/380450 (owner: 10Marostegui) [09:26:08] (03CR) 10jenkins-bot: db-eqiad.php: Increase traffic for db1055 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/380450 (owner: 10Marostegui) [09:26:47] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Increase db1055 weight and remove main traffic from special slaves (duration: 00m 46s) [09:26:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:34:41] (03CR) 10Volans: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/379712 (owner: 10Muehlenhoff) [09:35:39] (03CR) 10Volans: [C: 031] "LGTM it was included only in cluster::management" [puppet] - 10https://gerrit.wikimedia.org/r/379713 (owner: 10Muehlenhoff) [09:40:05] (03CR) 10Volans: [C: 032] "self-merging, testing it in a minute with Luca" [puppet] - 10https://gerrit.wikimedia.org/r/380452 (https://phabricator.wikimedia.org/T166300) (owner: 10Volans) [09:41:34] (03CR) 10Hashar: base: invoke fail() instead of error() (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/377981 (owner: 10Hashar) [09:42:22] (03CR) 10Hashar: base: invoke fail() instead of error() (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/377981 (owner: 10Hashar) [09:42:54] (03PS3) 10Hashar: base: invoke fail() instead of error() [puppet] - 10https://gerrit.wikimedia.org/r/377981 [09:45:26] (03PS4) 10Elukey: Introduce profile::druid::worker [puppet] - 10https://gerrit.wikimedia.org/r/380449 (https://phabricator.wikimedia.org/T176223) [09:52:45] (03PS2) 10Muehlenhoff: Stop including a salt master in the cluster management role [puppet] - 10https://gerrit.wikimedia.org/r/379712 [09:54:24] (03CR) 10Muehlenhoff: [C: 032] Stop including a salt master in the cluster management role [puppet] - 10https://gerrit.wikimedia.org/r/379712 (owner: 10Muehlenhoff) [09:58:52] (03CR) 10Alexandros Kosiaris: [C: 032] base: invoke fail() instead of error() [puppet] - 10https://gerrit.wikimedia.org/r/377981 (owner: 10Hashar) [09:58:58] (03PS4) 10Alexandros Kosiaris: base: invoke fail() instead of error() [puppet] - 10https://gerrit.wikimedia.org/r/377981 (owner: 10Hashar) [09:59:00] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] base: invoke fail() instead of error() [puppet] - 10https://gerrit.wikimedia.org/r/377981 (owner: 10Hashar) [09:59:13] (03CR) 10Elukey: "No-op for PCC https://puppet-compiler.wmflabs.org/compiler02/8003/" [puppet] - 10https://gerrit.wikimedia.org/r/380449 (https://phabricator.wikimedia.org/T176223) (owner: 10Elukey) [09:59:38] (03PS1) 10Muehlenhoff: Revert "Stop including a salt master in the cluster management role" [puppet] - 10https://gerrit.wikimedia.org/r/380456 [10:00:31] 10Operations, 10ops-eqiad, 10Patch-For-Review, 10User-Elukey, 10User-Joe: rack and setup mw1307-1348 - https://phabricator.wikimedia.org/T165519#3631049 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['mw1323.eqiad.wmnet'] ``` and were **ALL** successful. [10:00:53] (03PS8) 10Alexandros Kosiaris: Adds myspell-lv, myspell-uk to aspell-uk to ores::base [puppet] - 10https://gerrit.wikimedia.org/r/377327 (https://phabricator.wikimedia.org/T175628) (owner: 10Halfak) [10:00:55] (03CR) 10Alexandros Kosiaris: [C: 032] Adds myspell-lv, myspell-uk to aspell-uk to ores::base [puppet] - 10https://gerrit.wikimedia.org/r/377327 (https://phabricator.wikimedia.org/T175628) (owner: 10Halfak) [10:01:07] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] Adds myspell-lv, myspell-uk to aspell-uk to ores::base [puppet] - 10https://gerrit.wikimedia.org/r/377327 (https://phabricator.wikimedia.org/T175628) (owner: 10Halfak) [10:01:59] (03PS2) 10Muehlenhoff: Revert "Stop including a salt master in the cluster management role" [puppet] - 10https://gerrit.wikimedia.org/r/380456 [10:02:16] (03PS2) 10Hashar: Decouple profile::ci::docker and arcanist install [puppet] - 10https://gerrit.wikimedia.org/r/379726 (https://phabricator.wikimedia.org/T176267) [10:02:38] 10Operations, 10vm-requests: Site: (QUANTITY) VM request for SERVICE[S] - https://phabricator.wikimedia.org/T176607#3631056 (10akosiaris) [10:02:52] 10Operations, 10vm-requests: EQIAD: 1 VM request for package building - https://phabricator.wikimedia.org/T176607#3631068 (10akosiaris) [10:02:57] (03CR) 10Hashar: "And I have moved an include from the contint module up to the role." [puppet] - 10https://gerrit.wikimedia.org/r/379726 (https://phabricator.wikimedia.org/T176267) (owner: 10Hashar) [10:03:18] 10Operations, 10vm-requests: EQIAD: 1 VM request for package building - https://phabricator.wikimedia.org/T176607#3631056 (10akosiaris) p:05Triage>03Normal a:03akosiaris Per T176472 [10:03:43] (03CR) 10Muehlenhoff: [C: 032] Revert "Stop including a salt master in the cluster management role" [puppet] - 10https://gerrit.wikimedia.org/r/380456 (owner: 10Muehlenhoff) [10:03:50] 10Operations, 10hardware-requests: New package builder host - https://phabricator.wikimedia.org/T176472#3626672 (10akosiaris) 05Open>03Resolved a:03akosiaris Agreed. See T176607 I 've resolve this one for now and we can always reopen [10:05:29] 10Operations, 10ops-eqiad, 10Patch-For-Review, 10User-Elukey, 10User-Joe: rack and setup mw1307-1348 - https://phabricator.wikimedia.org/T165519#3631090 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by elukey on neodymium.eqiad.wmnet for hosts: ``` ['mw1324.eqiad.wmnet', 'mw1325.eqiad.wmnet... [10:05:52] (03PS1) 10Ema: 1.14.0: prometheus metrics, BGP MED, bugfixes [debs/pybal] - 10https://gerrit.wikimedia.org/r/380459 (https://phabricator.wikimedia.org/T165764) [10:08:26] (03PS1) 10Alexandros Kosiaris: boron.eqiad.wmnet as a package builder host [dns] - 10https://gerrit.wikimedia.org/r/380461 (https://phabricator.wikimedia.org/T176607) [10:09:37] (03PS1) 10Ema: Revert "interface: remove unused definition ::offload" [puppet] - 10https://gerrit.wikimedia.org/r/380462 [10:10:45] (03CR) 10Ema: "interface::offload has been removed, https://gerrit.wikimedia.org/r/380462 adds it back." [puppet] - 10https://gerrit.wikimedia.org/r/379800 (owner: 10BBlack) [10:11:22] (03CR) 10Volans: [C: 031] "compiler seems sane to me:" [puppet] - 10https://gerrit.wikimedia.org/r/379525 (owner: 10Muehlenhoff) [10:11:33] (03CR) 10Ema: [C: 031] LVS: turn off ip_early_demux [puppet] - 10https://gerrit.wikimedia.org/r/379798 (owner: 10BBlack) [10:13:42] (03CR) 10Ema: [C: 032] 1.14.0: prometheus metrics, BGP MED, bugfixes [debs/pybal] - 10https://gerrit.wikimedia.org/r/380459 (https://phabricator.wikimedia.org/T165764) (owner: 10Ema) [10:14:02] !log add mw1323 to live traffic (mw appserver) - traffic weights will go from 5 to 20 incrementally [10:14:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:17:12] (03PS1) 10Volans: wmf-auto-reimage: fix reference to the other script [puppet] - 10https://gerrit.wikimedia.org/r/380464 [10:18:16] (03CR) 10Elukey: [C: 031] wmf-auto-reimage: fix reference to the other script [puppet] - 10https://gerrit.wikimedia.org/r/380464 (owner: 10Volans) [10:18:31] (03CR) 10Volans: [C: 032] wmf-auto-reimage: fix reference to the other script [puppet] - 10https://gerrit.wikimedia.org/r/380464 (owner: 10Volans) [10:19:06] (03PS5) 10Muehlenhoff: Stop using salt minion in production [puppet] - 10https://gerrit.wikimedia.org/r/379525 [10:21:16] 10Operations, 10ops-eqiad, 10Patch-For-Review, 10User-Elukey, 10User-Joe: rack and setup mw1307-1348 - https://phabricator.wikimedia.org/T165519#3631133 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on neodymium.eqiad.wmnet for hosts: ``` ['mw1324.eqiad.wmnet', 'mw1325.eqiad.wmnet... [10:21:37] (03CR) 10Muehlenhoff: [C: 032] Stop using salt minion in production [puppet] - 10https://gerrit.wikimedia.org/r/379525 (owner: 10Muehlenhoff) [10:27:17] (03PS1) 10Alexandros Kosiaris: Add boron as a package builder host [puppet] - 10https://gerrit.wikimedia.org/r/380465 (https://phabricator.wikimedia.org/T176607) [10:35:17] (03PS1) 10Marostegui: db-eqiad.php: Repool db1055 with normal weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/380470 [10:35:57] <_joe_> akosiaris: wasn't boron a frack host? [10:37:20] _joe_: yeah but I don't see it in DNS anymore [10:37:52] (03PS1) 10Elukey: mediawiki::web::modules: force dependency between apache confs [puppet] - 10https://gerrit.wikimedia.org/r/380472 [10:38:31] _joe_: https://phabricator.wikimedia.org/rODNSa740c1e465104e3693e6ec7a0e1b85d0c7169a43 [10:38:36] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Repool db1055 with normal weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/380470 (owner: 10Marostegui) [10:38:43] was removed back on May 4 [10:38:49] it's almost 5 months [10:38:54] I am guessing I can re-use it [10:40:00] (03PS1) 10Muehlenhoff: Remove salt-key from puppet certmanager sudo config [puppet] - 10https://gerrit.wikimedia.org/r/380473 [10:41:15] (03Merged) 10jenkins-bot: db-eqiad.php: Repool db1055 with normal weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/380470 (owner: 10Marostegui) [10:41:29] (03CR) 10jenkins-bot: db-eqiad.php: Repool db1055 with normal weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/380470 (owner: 10Marostegui) [10:42:20] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Restore db1055 weight (duration: 00m 46s) [10:42:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:45:00] (03PS1) 10Muehlenhoff: Stop providing a salt master for role::cluster::management [puppet] - 10https://gerrit.wikimedia.org/r/380474 [10:46:13] (03PS1) 10Muehlenhoff: Remove salt-key from dc-ops privileges [puppet] - 10https://gerrit.wikimedia.org/r/380475 [10:48:55] jouncebot: next [10:48:56] In 0 hour(s) and 11 minute(s): Creation of the Hindi Wikivoyage wiki (task T173013) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170925T1100) [10:48:56] T173013: Create Wikivoyage Hindi - https://phabricator.wikimedia.org/T173013 [10:51:29] (03PS1) 10Muehlenhoff: Stop defining salt grain per labs project [puppet] - 10https://gerrit.wikimedia.org/r/380476 [10:51:31] (03PS1) 10Muehlenhoff: Remove support for setting custom Salt grains [puppet] - 10https://gerrit.wikimedia.org/r/380477 [10:53:12] (03CR) 10Alexandros Kosiaris: [C: 032] boron.eqiad.wmnet as a package builder host [dns] - 10https://gerrit.wikimedia.org/r/380461 (https://phabricator.wikimedia.org/T176607) (owner: 10Alexandros Kosiaris) [10:57:36] (03CR) 10Volans: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/380473 (owner: 10Muehlenhoff) [11:00:05] Dereckson: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for Creation of the Hindi Wikivoyage wiki (task T173013) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170925T1100). [11:00:05] No GERRIT patches in the queue for this window AFAICS. [11:00:05] T173013: Create Wikivoyage Hindi - https://phabricator.wikimedia.org/T173013 [11:03:43] Dereckson: so the time has come :) [11:04:22] time for? [11:04:34] Oh yes your windows to create hi.wikivoyage [11:04:40] you cleared the window with greg-g? [11:04:43] :) [11:04:47] hmm [11:05:04] well, nope :( *facepalm* [11:05:49] but I see https://phabricator.wikimedia.org/T173013#3601367 [11:05:57] so it should be fine [11:06:43] I'm here, just in case you need me for something. Unlikely but... well, I'll be around. [11:06:58] DNS is okay [11:09:36] (03PS1) 10Volans: sshd: allow to tune MaxSessions and MaxStartups [puppet] - 10https://gerrit.wikimedia.org/r/380482 (https://phabricator.wikimedia.org/T176609) [11:10:31] (03CR) 10Muehlenhoff: [C: 032] Remove salt-key from puppet certmanager sudo config [puppet] - 10https://gerrit.wikimedia.org/r/380473 (owner: 10Muehlenhoff) [11:11:42] (03PS18) 10MarcoAurelio: Initial configuration for hi.wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371109 (https://phabricator.wikimedia.org/T173013) [11:12:27] should be no problem as hi.* is a new sister project of an already existing language [11:12:52] (03CR) 10Volans: "Puppet compiler looks sane to me for prod hosts:" [puppet] - 10https://gerrit.wikimedia.org/r/380482 (https://phabricator.wikimedia.org/T176609) (owner: 10Volans) [11:14:31] (03PS1) 10Gilles: Thumbor: expose Nginx request time [puppet] - 10https://gerrit.wikimedia.org/r/380483 (https://phabricator.wikimedia.org/T161535) [11:14:52] (03PS2) 10Gilles: Thumbor: expose Nginx request time [puppet] - 10https://gerrit.wikimedia.org/r/380483 (https://phabricator.wikimedia.org/T161535) [11:15:30] So we're on .19 and your l10n change is her since .14, that's fine too. [11:15:39] Yes, all seem ready. [11:16:18] Dereckson: I can update wikiversions should we need that [11:16:25] let me do that [11:16:32] so we don't have to do that later? [11:16:55] ehm, forget about that [11:17:10] I didn't added wikiversions.json to that patch to avoid merge conflicts [11:17:15] shall I add it now? [11:17:56] Yes, you can, to 19 so [11:18:17] https://gerrit.wikimedia.org/r/#/c/371109/18/dblists/wikivoyage.dblist <- if you update the change, there is a fix at EOF added by error [11:19:15] Dereckson: I'm adding wikiversions now [11:19:24] checking wikivoyage.dblist [11:19:47] Dereckson: just adding a new empty final line? [11:20:41] (03PS19) 10MarcoAurelio: Initial configuration for hi.wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371109 (https://phabricator.wikimedia.org/T173013) [11:21:24] tabbycat: seems so [11:21:27] thanks [11:21:28] (03PS20) 10MarcoAurelio: Initial configuration for hi.wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371109 (https://phabricator.wikimedia.org/T173013) [11:21:32] done [11:22:03] (03CR) 10Dereckson: [C: 032] Initial configuration for hi.wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371109 (https://phabricator.wikimedia.org/T173013) (owner: 10MarcoAurelio) [11:25:47] (03Merged) 10jenkins-bot: Initial configuration for hi.wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371109 (https://phabricator.wikimedia.org/T173013) (owner: 10MarcoAurelio) [11:25:58] (03CR) 10jenkins-bot: Initial configuration for hi.wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371109 (https://phabricator.wikimedia.org/T173013) (owner: 10MarcoAurelio) [11:32:45] !log dereckson@tin Synchronized dblists: Add hiwikivoyage to the set (T173013) (duration: 00m 46s) [11:32:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:32:52] T173013: Create Wikivoyage Hindi - https://phabricator.wikimedia.org/T173013 [11:33:34] ebernhardson: I've a very few (from a mwscript to a non existing database) 11 error: Uncaught exception 'MWException' with message 'Variable 'wgCirrusSearchInterleaveConfig' is not set.' in /srv/mediawiki/php-1.30.0-wmf.19/maintenance/getConfiguration.php:105 [11:34:00] ebernhardson: to reproduce : mwscript --wiki=nonexistingone somethingexisting.php [11:34:38] !log dereckson@tin rebuilt wikiversions.php and synchronized wikiversions files: +hiwikivoyage (T173013) [11:34:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:34:54] (03PS1) 10Elukey: nutcracker: create the service only after the package install [puppet] - 10https://gerrit.wikimedia.org/r/380487 [11:35:26] marostegui: hola :) wiki creation is happening now. Not sure about the internals though (I cannot see as far as you) :) [11:35:50] tabbycat: I am checking the filtering [11:35:57] chachi [11:35:59] The wiki is working fine, syncing to prod. [11:36:10] chachi XDDD [11:36:13] xD [11:37:07] !log dereckson@tin Synchronized static/images/project-logos/: Logos for hi.wikivoyage.org (T173013) (duration: 00m 45s) [11:37:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:37:56] !log dereckson@tin Synchronized wmf-config/InitialiseSettings.php: Initial configuration for hi.wikivoyage.org (T173013) (duration: 00m 46s) [11:38:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:38:01] T173013: Create Wikivoyage Hindi - https://phabricator.wikimedia.org/T173013 [11:39:28] (03PS1) 10Dereckson: Update interwiki map [mediawiki-config] - 10https://gerrit.wikimedia.org/r/380489 [11:39:43] !log upgrading tor to latest stable release (0.3.1.7) [11:39:46] !log Run redact_sanitarium for hiwikivoyage on db1095 - T173027 [11:39:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:39:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:39:52] T173027: Prepare and check storage layer for hi.wikivoyage - https://phabricator.wikimedia.org/T173027 [11:40:41] 10Operations, 10Traffic: Select or Acquire Address Space for Asia Cache DC - https://phabricator.wikimedia.org/T156256#3631361 (10faidon) Status update: back in April, APNIC had requested documentation supporting that we have or about to have a presence in the Asia-Pacific region. We didn't have any besides ou... [11:40:53] Dereckson: I got a ticket number for the interwiki map update [11:41:01] mind adding it so I can remove it from SWAT? [11:41:16] sure, which one? [11:41:43] T176572 [11:41:43] T176572: Update the interwiki map, following on-wiki updates - https://phabricator.wikimedia.org/T176572 [11:41:49] Thanks, adding [11:41:52] :) [11:42:02] two birds with one stone [11:42:03] (03PS2) 10Dereckson: Update interwiki map [mediawiki-config] - 10https://gerrit.wikimedia.org/r/380489 (https://phabricator.wikimedia.org/T176572) [11:42:18] (03CR) 10Dereckson: [C: 032] Update interwiki map [mediawiki-config] - 10https://gerrit.wikimedia.org/r/380489 (https://phabricator.wikimedia.org/T176572) (owner: 10Dereckson) [11:43:58] (03Merged) 10jenkins-bot: Update interwiki map [mediawiki-config] - 10https://gerrit.wikimedia.org/r/380489 (https://phabricator.wikimedia.org/T176572) (owner: 10Dereckson) [11:44:48] PROBLEM - MariaDB Slave Lag: s3 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 890.01 seconds [11:45:13] !log dereckson@tin Synchronized wmf-config/interwiki.php: Interwiki map update (+hiwikivoyage and other changes previous month, T176572) (duration: 00m 46s) [11:45:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:46:03] (03CR) 10jenkins-bot: Update interwiki map [mediawiki-config] - 10https://gerrit.wikimedia.org/r/380489 (https://phabricator.wikimedia.org/T176572) (owner: 10Dereckson) [11:46:09] noted the update at Meta-Wiki as well [11:47:12] !log Create new database and tables for hi.wikivoyage (T173013) [11:47:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:47:18] T173013: Create Wikivoyage Hindi - https://phabricator.wikimedia.org/T173013 [11:47:23] !log Repopulate sites tables for Wikidata clients (T173013) [11:47:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:49:56] can the dbstore1002's s3 lag be related to the ongoing hiwikivoyage create index query? [11:51:07] If the wiki is not private, not a Wikimania conference wiki, and not special wiki like usability/outreach/login/vote/strategy/etc., send a change proposal to analytics/refinery.git that adds the wiki to static_data/pageview/whitelist/whitelist.tsv [11:51:16] You've done this part tabbycat? [11:51:34] elukey: hiwikivoyage has well been created on s3 [11:52:36] Dereckson: sure, but I am seeing a create index query on dbstore1002, that is why I was asking :) [11:53:46] Dereckson: yes, and it was merged by nuria_ a couple of days ago [11:54:01] it should be listed in the topic link I posted on the task [11:54:27] 10Operations, 10OCG-General, 10Reading-Community-Engagement, 10Epic, and 3 others: [EPIC] (Proposal) Replicate core OCG features and sunset OCG service - https://phabricator.wikimedia.org/T150871#3631395 (10phuedx) [11:54:47] Add hi.wikivoyage to stats Merged MarcoAurelio analytics/refinery master (T173013) 20. Sep +2 +2 [11:54:48] T173013: Create Wikivoyage Hindi - https://phabricator.wikimedia.org/T173013 [11:54:51] !log removing salt-minion/salt-common from production [11:54:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:55:52] Dereckson: it was https://gerrit.wikimedia.org/r/#/c/371100/ for reference [11:56:19] (03PS12) 10MarcoAurelio: Cloud VPS configuration for hi.wikivoyage [puppet] - 10https://gerrit.wikimedia.org/r/371096 (https://phabricator.wikimedia.org/T173013) [11:56:43] I added https://gerrit.wikimedia.org/r/#/c/371096/ to Puppet SWAT, but not sure if you want to do it now? [11:57:46] I've not access to that [11:58:10] okay so they'll do that tomorror [11:58:20] unless _joe_ thinks otherwise [11:58:35] !log powercycle ms-be2020 - blk_update_request: I/O error, dev sdb, sector in consol [11:58:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:58:47] I have to leave for half an hour or so, Dereckson. [11:59:16] Okay, thanks for your changes and your support. [11:59:19] I think we're done. [11:59:56] Thanks to you for your help. I think we should keep the task open so the Parsoid, Wikidata, etc. things are done. [12:00:06] ok [12:00:18] PROBLEM - Host ms-be2020 is DOWN: PING CRITICAL - Packet loss = 100% [12:00:24] I don't think I can do any of those (patching I mean) [12:00:28] bbl [12:02:27] (03CR) 10Muehlenhoff: [C: 031] "Looks good to me." [puppet] - 10https://gerrit.wikimedia.org/r/380482 (https://phabricator.wikimedia.org/T176609) (owner: 10Volans) [12:05:08] RECOVERY - SSH on ms-be2020 is OK: SSH OK - OpenSSH_7.4p1 Debian-10 (protocol 2.0) [12:05:08] RECOVERY - Host ms-be2020 is UP: PING OK - Packet loss = 0%, RTA = 36.27 ms [12:05:28] RECOVERY - Disk space on ms-be2020 is OK: DISK OK [12:05:48] PROBLEM - DPKG on sarin is CRITICAL: DPKG CRITICAL dpkg reports broken packages [12:07:08] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: db2047 got rebooted - https://phabricator.wikimedia.org/T176573#3631438 (10Marostegui) [12:10:29] PROBLEM - DPKG on labcontrol1002 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [12:10:38] PROBLEM - salt-minion processes on chlorine is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [12:11:08] PROBLEM - DPKG on neodymium is CRITICAL: DPKG CRITICAL dpkg reports broken packages [12:11:08] PROBLEM - salt-minion processes on argon is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [12:11:08] PROBLEM - DPKG on labcontrol1001 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [12:11:18] fixing that up [12:17:18] PROBLEM - Nginx local proxy to apache on mw2221 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:18:08] RECOVERY - Nginx local proxy to apache on mw2221 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.192 second response time [12:24:22] (03PS1) 10Volans: wmf-auto-reimage: remove Salt actions [puppet] - 10https://gerrit.wikimedia.org/r/380493 (https://phabricator.wikimedia.org/T164780) [12:24:49] 10Operations, 10DBA: Decommission db1015, db1035, db1044 and db1038 - https://phabricator.wikimedia.org/T148078#3631553 (10Marostegui) db1015 was scheduled for decommission: T173570 [12:25:06] 10Operations, 10DBA: Decommission db1015, db1035, db1044 and db1038 - https://phabricator.wikimedia.org/T148078#3631555 (10Marostegui) [12:25:29] (03CR) 10Elukey: [C: 031] wmf-auto-reimage: remove Salt actions [puppet] - 10https://gerrit.wikimedia.org/r/380493 (https://phabricator.wikimedia.org/T164780) (owner: 10Volans) [12:25:31] (03PS4) 10Aaron Schulz: Enable logging of post-send DB updates [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377412 (https://phabricator.wikimedia.org/T166199) [12:27:01] (03CR) 10Muehlenhoff: [C: 031] "Looks good to me!" [puppet] - 10https://gerrit.wikimedia.org/r/380493 (https://phabricator.wikimedia.org/T164780) (owner: 10Volans) [12:27:56] (03CR) 10Volans: [C: 032] wmf-auto-reimage: remove Salt actions [puppet] - 10https://gerrit.wikimedia.org/r/380493 (https://phabricator.wikimedia.org/T164780) (owner: 10Volans) [12:30:43] (03CR) 10Addshore: "check" [puppet] - 10https://gerrit.wikimedia.org/r/150042 (owner: 10Hashar) [12:31:05] (03CR) 10Addshore: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/150042 (owner: 10Hashar) [12:37:31] (03PS2) 10Volans: sshd: allow to tune MaxSessions and MaxStartups [puppet] - 10https://gerrit.wikimedia.org/r/380482 (https://phabricator.wikimedia.org/T176609) [12:38:23] !lof revoked all salt keys on neodymium [12:38:28] !log revoked all salt keys on neodymium [12:38:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:38:37] 10Operations, 10ops-eqiad, 10Patch-For-Review, 10User-Elukey, 10User-Joe: rack and setup mw1307-1348 - https://phabricator.wikimedia.org/T165519#3631599 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['mw1325.eqiad.wmnet', 'mw1324.eqiad.wmnet', 'mw1326.eqiad.wmnet'] ``` Of which those **FAI... [12:39:12] (03PS2) 10Muehlenhoff: Stop providing a salt master for role::cluster::management [puppet] - 10https://gerrit.wikimedia.org/r/380474 [12:40:57] (03PS13) 10MarcoAurelio: Cloud VPS configuration for hi.wikivoyage [puppet] - 10https://gerrit.wikimedia.org/r/371096 (https://phabricator.wikimedia.org/T173013) [12:41:24] (03CR) 10Muehlenhoff: [C: 032] Stop providing a salt master for role::cluster::management [puppet] - 10https://gerrit.wikimedia.org/r/380474 (owner: 10Muehlenhoff) [12:45:16] (03PS2) 10Muehlenhoff: Remove obsolete role::salt::masters::production class [puppet] - 10https://gerrit.wikimedia.org/r/379713 [12:46:48] (03CR) 10Muehlenhoff: [C: 032] Remove obsolete role::salt::masters::production class [puppet] - 10https://gerrit.wikimedia.org/r/379713 (owner: 10Muehlenhoff) [12:46:57] (03PS3) 10Volans: sshd: allow to tune MaxSessions and MaxStartups [puppet] - 10https://gerrit.wikimedia.org/r/380482 (https://phabricator.wikimedia.org/T176609) [12:47:57] (03CR) 10Volans: [C: 032] sshd: allow to tune MaxSessions and MaxStartups [puppet] - 10https://gerrit.wikimedia.org/r/380482 (https://phabricator.wikimedia.org/T176609) (owner: 10Volans) [12:50:58] jouncebot: refresh [12:51:01] I refreshed my knowledge about deployments. [12:51:02] jouncebot: next [12:51:03] In 0 hour(s) and 8 minute(s): European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170925T1300) [12:51:18] RECOVERY - DPKG on neodymium is OK: All packages OK [12:52:56] !log removed salt master from neodymium [12:53:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:53:38] 10Operations, 10Operations-Software-Development, 10Goal, 10Patch-For-Review, 10Technical-Debt: Sunset our use of Salt - https://phabricator.wikimedia.org/T164780#3631673 (10MoritzMuehlenhoff) [13:00:04] addshore, hashar, anomie, RainbowSprinkles, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for European Mid-day SWAT(Max 8 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170925T1300). [13:00:04] Jayprakash12345, Amir1, tabbycat, and addshore: A patch you scheduled for European Mid-day SWAT(Max 8 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [13:00:16] o/ [13:00:35] I can SWAT today [13:02:46] (03CR) 10Ema: [C: 032] Revert "interface: remove unused definition ::offload" [puppet] - 10https://gerrit.wikimedia.org/r/380462 (owner: 10Ema) [13:02:53] (03PS2) 10Ema: Revert "interface: remove unused definition ::offload" [puppet] - 10https://gerrit.wikimedia.org/r/380462 [13:02:55] (03CR) 10Ema: [V: 032 C: 032] Revert "interface: remove unused definition ::offload" [puppet] - 10https://gerrit.wikimedia.org/r/380462 (owner: 10Ema) [13:02:59] wtf is that bot humor on jouncebot [13:03:12] hashar: :D [13:03:14] I like it [13:03:33] Jayprakash12345: around for swat? [13:03:55] Amir1: looks like Jayprakash12345 is not around, reviewing your commit [13:04:14] (03PS2) 10Zfilipin: Add several rights to eliminators in fawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379980 (https://phabricator.wikimedia.org/T176553) (owner: 10Ladsgroup) [13:04:18] awesome thanks [13:04:34] (03PS2) 10Ema: LVS: turn off ip_early_demux [puppet] - 10https://gerrit.wikimedia.org/r/379798 (owner: 10BBlack) [13:04:45] (03PS3) 10Ema: Global: Turn off ethernet flow for all interfaces [puppet] - 10https://gerrit.wikimedia.org/r/379799 (owner: 10BBlack) [13:04:52] (03PS4) 10Ema: LVS: Disable LRO [puppet] - 10https://gerrit.wikimedia.org/r/379800 (owner: 10BBlack) [13:05:02] (03PS6) 10Ema: Caches: Disable LRO [puppet] - 10https://gerrit.wikimedia.org/r/379801 (owner: 10BBlack) [13:05:59] zeljkof: my task on this swat window was already done during the creation of the wiki above so there's nothing to do from my side [13:06:06] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379980 (https://phabricator.wikimedia.org/T176553) (owner: 10Ladsgroup) [13:06:36] tabbycat: great, feel free to remove your entry from the swat window :) [13:06:50] zeljkof: I cannot login there right now [13:06:53] I'll try later [13:07:15] nothing wrong with my account though [13:07:55] o/ [13:08:20] addshore: want to deploy your change, or should I? [13:08:34] zeljkof: feel free to deploy it :) thanks! [13:08:41] (03Merged) 10jenkins-bot: Add several rights to eliminators in fawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379980 (https://phabricator.wikimedia.org/T176553) (owner: 10Ladsgroup) [13:08:51] (03CR) 10jenkins-bot: Add several rights to eliminators in fawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379980 (https://phabricator.wikimedia.org/T176553) (owner: 10Ladsgroup) [13:10:44] (03CR) 10Ema: [C: 031] "LGTM and to pcc https://puppet-compiler.wmflabs.org/compiler02/8010/lvs3002.esams.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/379800 (owner: 10BBlack) [13:10:45] 10Operations, 10Operations-Software-Development, 10Goal, 10Patch-For-Review, 10Technical-Debt: Sunset our use of Salt - https://phabricator.wikimedia.org/T164780#3631737 (10MoritzMuehlenhoff) [13:10:56] Amir1: the patch is at mwdebug1002, please test and let me know if I can proceed [13:11:05] zeljkof: sure [13:14:06] 10Operations, 10ops-eqiad, 10Patch-For-Review, 10User-Elukey, 10User-Joe: rack and setup mw1307-1348 - https://phabricator.wikimedia.org/T165519#3631741 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on neodymium.eqiad.wmnet for hosts: ``` ['mw1324.eqiad.wmnet', 'mw1325.eqiad.wmnet... [13:15:17] Amir1: just checking, do you need more time to test? [13:15:29] zeljkof: less than a minute [13:15:39] ok, take your time, just checking :) [13:16:16] zeljkof: it works, please proceed :) [13:16:44] I have a sockpuppet: "🌈" I had to give it eliminator rights and check if it has rollback, etc . [13:17:15] https://fa.wikipedia.org/w/index.php?title=%DA%A9%D8%A7%D8%B1%D8%A8%D8%B1:%F0%9F%8C%88&redirect=no [13:17:15] Amir1: deploying :) [13:18:07] (03PS1) 10Elukey: Revert "profile::kafka::broker: add the cluster label to the prometheus metrics" [puppet] - 10https://gerrit.wikimedia.org/r/380497 [13:18:30] (03PS2) 10Elukey: Revert "profile::kafka::broker: add the cluster label to the prometheus metrics" [puppet] - 10https://gerrit.wikimedia.org/r/380497 [13:18:34] !log zfilipin@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:379980|Add several rights to eliminators in fawiki (T176553)]] (duration: 00m 46s) [13:18:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:18:40] T176553: Add rollback, autopatrol, extended confirmed, patroller and uploader to eliminators in fawiki - https://phabricator.wikimedia.org/T176553 [13:18:58] Amir1: deployed, if there is anything to check, please do so :) [13:19:22] (03PS1) 10Muehlenhoff: Purge salt-minion and salt-common in labs [puppet] - 10https://gerrit.wikimedia.org/r/380498 [13:19:31] sure [13:20:08] Done, works just fine, thank :) [13:20:11] *thank you [13:20:29] \o/ [13:20:29] Amir1: thanks for releasing with #releng! ;) [13:20:39] (03CR) 10Ema: [C: 031] "Looks good: https://puppet-compiler.wmflabs.org/compiler02/8011/cp3040.esams.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/379801 (owner: 10BBlack) [13:23:22] 10Operations, 10ops-eqiad, 10Patch-For-Review, 10User-Elukey, 10User-Joe: rack and setup mw1307-1348 - https://phabricator.wikimedia.org/T165519#3631789 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['mw1324.eqiad.wmnet', 'mw1325.eqiad.wmnet', 'mw1326.eqiad.wmnet'] ``` and were **ALL** suc... [13:23:28] addshore: uh, any reason zfilipin@tin:/srv/mediawiki-staging/php-1.30.0-wmf.19/extensions/WikimediaEvents$ is "HEAD detached from 5931c0d" [13:23:45] hmmmmm [13:23:48] * addshore has no idea [13:23:49] * addshore logs in [13:23:53] addshore: should I "git checkout master"? [13:25:22] 5931c0d Kartographer: Protect against undefined data.options [13:25:44] hmm, wait, I think thats normal isnt it? as they are submodules [13:25:56] not sure [13:25:56] addshore@tin:/srv/mediawiki-staging/php-1.30.0-wmf.19$ git status [13:25:56] On branch wmf/1.30.0-wmf.19 [13:25:59] (03PS9) 10Filippo Giunchedi: rsyslog: add support to receive syslog over TLS [puppet] - 10https://gerrit.wikimedia.org/r/369950 (https://phabricator.wikimedia.org/T136312) [13:26:04] (03CR) 10Elukey: [C: 032] Revert "profile::kafka::broker: add the cluster label to the prometheus metrics" [puppet] - 10https://gerrit.wikimedia.org/r/380497 (owner: 10Elukey) [13:26:06] (03PS1) 10Muehlenhoff: Remove some salt references [puppet] - 10https://gerrit.wikimedia.org/r/380500 [13:26:21] addshore@tin:/srv/mediawiki-staging/php-1.30.0-wmf.19/extensions/Echo$ git status [13:26:21] HEAD detached at 31709919 [13:26:44] yes, looking at other repos, all are detached HEAD [13:26:59] ok, continuing then as usual, I did not notice that before [13:27:06] just need the git fetch in the mediawiki dir, then git rebase, then git submodule update --init --recursive extensions/WikimediaEvents :) [13:27:18] addshore: want to do it yourself? ;) [13:27:30] I can do! how far have you got? [13:27:43] did not do anything yet :) [13:27:55] ack, {{doing}} [13:28:06] well, I did merge the commit [13:28:10] ack [13:29:21] (03PS3) 10Zfilipin: Import sources on hr.wikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379254 (https://phabricator.wikimedia.org/T176320) (owner: 10Jayprakash12345) [13:30:17] syncing [13:30:49] !log addshore@tin Synchronized php-1.30.0-wmf.19/extensions/WikimediaEvents/WikimediaEventsHooks.php: SWAT: [[gerrit:379749]] onBeforeInitializeWMDECampaign, fix early bail condition T174948 (duration: 00m 46s) [13:30:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:30:55] T174948: Deploy 'hack' patch & logging for tracking user registrations and guided tour - https://phabricator.wikimedia.org/T174948 [13:30:58] zeljkof: thats me all done! [13:31:06] addshore: thanks! [13:31:11] and it all looks good :) [13:31:19] I'll take over the rest [13:31:25] ty! :D [13:31:38] RECOVERY - DPKG on labcontrol1001 is OK: All packages OK [13:31:51] hashar: a question, Jayprakash12345 has two commits for EU SWAT, but he is not around [13:32:12] the commits don't look like they would break stuff, should I just deploy? [13:33:18] zeljkof: can you let me know once swat is finished? I'll need to deploy a puppet change, not related but avoiding to overlap [13:33:24] (03PS7) 10Andrew Bogott: nova: turn off hourly instance usage audits [puppet] - 10https://gerrit.wikimedia.org/r/377187 [13:33:44] godog: sure, there a two small commits left, should be done in 10-20 minutes [13:33:52] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379254 (https://phabricator.wikimedia.org/T176320) (owner: 10Jayprakash12345) [13:34:26] 10Operations, 10Discovery, 10Elasticsearch, 10Wikimedia-Logstash, and 2 others: api feature logs should be sent to both eqiad and codfw clusters - https://phabricator.wikimedia.org/T176430#3631823 (10Gehel) [13:34:56] (03CR) 10Andrew Bogott: [C: 032] nova: turn off hourly instance usage audits [puppet] - 10https://gerrit.wikimedia.org/r/377187 (owner: 10Andrew Bogott) [13:35:21] tata for now! [13:35:31] (03Merged) 10jenkins-bot: Import sources on hr.wikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379254 (https://phabricator.wikimedia.org/T176320) (owner: 10Jayprakash12345) [13:35:41] 10Operations, 10Traffic, 10Patch-For-Review: Recurrent 'mailbox lag' critical alerts and 500s - https://phabricator.wikimedia.org/T174932#3631827 (10ema) So it looks like [[ https://gerrit.wikimedia.org/r/#/c/376751/ | stabilizing backend storage patterns]] in combination with [[https://gerrit.wikimedia.org/... [13:36:10] (03CR) 10jenkins-bot: Import sources on hr.wikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379254 (https://phabricator.wikimedia.org/T176320) (owner: 10Jayprakash12345) [13:37:19] !log zfilipin@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:379254|Import sources on hr.wikibooks (T176320)]] (duration: 00m 46s) [13:37:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:37:24] T176320: Enable special:import in hr.wikibooks.org - https://phabricator.wikimedia.org/T176320 [13:40:05] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377719 (https://phabricator.wikimedia.org/T175721) (owner: 10Urbanecm) [13:40:09] (03PS1) 10Alexandros Kosiaris: Specify keyholder_key in global scap.cfg [puppet] - 10https://gerrit.wikimedia.org/r/380503 (https://phabricator.wikimedia.org/T172333) [13:42:23] (03Merged) 10jenkins-bot: Make hiwiki logo a little bit bigger [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377719 (https://phabricator.wikimedia.org/T175721) (owner: 10Urbanecm) [13:42:36] (03CR) 10jenkins-bot: Make hiwiki logo a little bit bigger [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377719 (https://phabricator.wikimedia.org/T175721) (owner: 10Urbanecm) [13:43:36] I think we need https://wikitech.wikimedia.org/wiki/Add_a_wiki#Parsoid for hi.wikivoyage [13:43:43] but I don't know what to do [13:45:27] !log zfilipin@tin Synchronized static/images/project-logos/hiwiki.png: SWAT: [[gerrit:377719|Make hiwiki logo a little bit bigger (T175721)]] (duration: 00m 46s) [13:45:30] (03PS1) 10Gehel: logstash: api feature logs should be sent to both eqiad and codfw clusters [puppet] - 10https://gerrit.wikimedia.org/r/380507 (https://phabricator.wikimedia.org/T176430) [13:45:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:45:32] T175721: Make hiwiki logo a little bit bigger - https://phabricator.wikimedia.org/T175721 [13:46:27] (03PS1) 10Ema: pybal: bind BGP TCP port to private addresses [puppet] - 10https://gerrit.wikimedia.org/r/380508 (https://phabricator.wikimedia.org/T103882) [13:46:38] !log EU SWAT finished [13:46:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:46:50] godog: EU SWAT done [13:47:17] 10Operations, 10Operations-Software-Development, 10Goal, 10Technical-Debt, 10cloud-services-team (Kanban): Remove salt master (and related packages) from labcontrol1001 - https://phabricator.wikimedia.org/T176632#3631899 (10Andrew) [13:47:19] zeljkof: neat, thanks [13:47:30] (03CR) 10Filippo Giunchedi: [C: 032] rsyslog: add support to receive syslog over TLS [puppet] - 10https://gerrit.wikimedia.org/r/369950 (https://phabricator.wikimedia.org/T136312) (owner: 10Filippo Giunchedi) [13:47:41] (03PS10) 10Filippo Giunchedi: rsyslog: add support to receive syslog over TLS [puppet] - 10https://gerrit.wikimedia.org/r/369950 (https://phabricator.wikimedia.org/T136312) [13:49:08] RECOVERY - DPKG on labcontrol1002 is OK: All packages OK [13:51:32] (03PS1) 10Elukey: profile::kafka::broker_prometheus_exp: update delayed op metric [puppet] - 10https://gerrit.wikimedia.org/r/380509 (https://phabricator.wikimedia.org/T175922) [13:51:34] (03CR) 10Filippo Giunchedi: [V: 032 C: 032] rsyslog: add support to receive syslog over TLS [puppet] - 10https://gerrit.wikimedia.org/r/369950 (https://phabricator.wikimedia.org/T136312) (owner: 10Filippo Giunchedi) [13:54:46] (03CR) 10Gehel: "Puppet compiler looks reasonable: https://puppet-compiler.wmflabs.org/compiler02/8013/" [puppet] - 10https://gerrit.wikimedia.org/r/380507 (https://phabricator.wikimedia.org/T176430) (owner: 10Gehel) [13:55:37] (03PS2) 10Filippo Giunchedi: Configure agent to export Cassandra histogram metrics [puppet] - 10https://gerrit.wikimedia.org/r/379610 (https://phabricator.wikimedia.org/T171772) (owner: 10Eevans) [13:56:54] (03CR) 10DCausse: logstash: api feature logs should be sent to both eqiad and codfw clusters (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/380507 (https://phabricator.wikimedia.org/T176430) (owner: 10Gehel) [13:59:52] (03CR) 10Filippo Giunchedi: [C: 032] Configure agent to export Cassandra histogram metrics [puppet] - 10https://gerrit.wikimedia.org/r/379610 (https://phabricator.wikimedia.org/T171772) (owner: 10Eevans) [14:01:15] (03CR) 10Gehel: logstash: api feature logs should be sent to both eqiad and codfw clusters (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/380507 (https://phabricator.wikimedia.org/T176430) (owner: 10Gehel) [14:02:10] (03CR) 10Volans: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/380500 (owner: 10Muehlenhoff) [14:02:39] (03CR) 10DCausse: [C: 031] logstash: api feature logs should be sent to both eqiad and codfw clusters [puppet] - 10https://gerrit.wikimedia.org/r/380507 (https://phabricator.wikimedia.org/T176430) (owner: 10Gehel) [14:03:23] (03PS14) 10MarcoAurelio: Cloud VPS configuration for hi.wikivoyage [puppet] - 10https://gerrit.wikimedia.org/r/371096 (https://phabricator.wikimedia.org/T173013) [14:03:35] (03PS4) 10Alexandros Kosiaris: mariadb: Implement regular logical backups using mydumper [puppet] - 10https://gerrit.wikimedia.org/r/374560 (https://phabricator.wikimedia.org/T169516) (owner: 10Jcrespo) [14:05:00] (03CR) 10Alexandros Kosiaris: "I 've taken the liberty to change a few things per my previous comments. Lemme know what you think" [puppet] - 10https://gerrit.wikimedia.org/r/374560 (https://phabricator.wikimedia.org/T169516) (owner: 10Jcrespo) [14:07:19] (03PS2) 10Elukey: profile::kafka::broker_prometheus_exp: update delayed op metric [puppet] - 10https://gerrit.wikimedia.org/r/380509 (https://phabricator.wikimedia.org/T175922) [14:07:24] (03PS2) 10Alexandros Kosiaris: Add boron as a package builder host [puppet] - 10https://gerrit.wikimedia.org/r/380465 (https://phabricator.wikimedia.org/T176607) [14:07:58] (03CR) 10Jcrespo: "All changes are cool, as I said, the main issue is to make it work, not all your suggestions :-)" [puppet] - 10https://gerrit.wikimedia.org/r/374560 (https://phabricator.wikimedia.org/T169516) (owner: 10Jcrespo) [14:10:47] (03CR) 10Filippo Giunchedi: [C: 032] Thumbor: expose Nginx request time [puppet] - 10https://gerrit.wikimedia.org/r/380483 (https://phabricator.wikimedia.org/T161535) (owner: 10Gilles) [14:10:54] (03PS3) 10Filippo Giunchedi: Thumbor: expose Nginx request time [puppet] - 10https://gerrit.wikimedia.org/r/380483 (https://phabricator.wikimedia.org/T161535) (owner: 10Gilles) [14:11:28] (03PS3) 10Alexandros Kosiaris: Add boron as a package builder host [puppet] - 10https://gerrit.wikimedia.org/r/380465 (https://phabricator.wikimedia.org/T176607) [14:11:42] (03PS2) 10Gehel: logstash: api feature logs should be sent to both eqiad and codfw clusters [puppet] - 10https://gerrit.wikimedia.org/r/380507 (https://phabricator.wikimedia.org/T176430) [14:12:14] (03CR) 10Gehel: [C: 032] logstash: api feature logs should be sent to both eqiad and codfw clusters [puppet] - 10https://gerrit.wikimedia.org/r/380507 (https://phabricator.wikimedia.org/T176430) (owner: 10Gehel) [14:13:08] (03CR) 10Volans: "Few random comments, feel free to ignore ;)" (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/374560 (https://phabricator.wikimedia.org/T169516) (owner: 10Jcrespo) [14:13:55] !log rolling restart of logstash for config change - T176430 [14:13:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:14:00] T176430: api feature logs should be sent to both eqiad and codfw clusters - https://phabricator.wikimedia.org/T176430 [14:15:06] (03CR) 10Ottomata: role::kafka::jumbo::broker: enable Prometheus JMX monitoring (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/377753 (https://phabricator.wikimedia.org/T175922) (owner: 10Elukey) [14:16:15] (03CR) 10Elukey: "Let's keep discussing in https://phabricator.wikimedia.org/T175922" [puppet] - 10https://gerrit.wikimedia.org/r/377753 (https://phabricator.wikimedia.org/T175922) (owner: 10Elukey) [14:17:37] (03CR) 10Jcrespo: [C: 04-1] "Again, you keep giving reviews to "tune up" / improve the script. Let me put it in capital letters- THE SCRIPT DOESNT WORK YET, because IT" [puppet] - 10https://gerrit.wikimedia.org/r/374560 (https://phabricator.wikimedia.org/T169516) (owner: 10Jcrespo) [14:19:17] (03PS4) 10Alexandros Kosiaris: Add boron as a package builder host [puppet] - 10https://gerrit.wikimedia.org/r/380465 (https://phabricator.wikimedia.org/T176607) [14:19:35] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] Add boron as a package builder host [puppet] - 10https://gerrit.wikimedia.org/r/380465 (https://phabricator.wikimedia.org/T176607) (owner: 10Alexandros Kosiaris) [14:21:09] 10Operations, 10Discovery, 10Elasticsearch, 10Wikimedia-Logstash, and 3 others: api feature logs should be sent to both eqiad and codfw clusters - https://phabricator.wikimedia.org/T176430#3632041 (10Gehel) a:03dcausse API Feature Usage logs are now sent to both codfw and eqiad. @dcausse could you make s... [14:21:29] (03PS1) 10Eevans: Follow links (copy the link destination) [puppet] - 10https://gerrit.wikimedia.org/r/380515 (https://phabricator.wikimedia.org/T171772) [14:23:26] (03PS2) 10Ema: pybal: bind BGP TCP port to private addresses [puppet] - 10https://gerrit.wikimedia.org/r/380508 (https://phabricator.wikimedia.org/T103882) [14:23:28] (03PS1) 10Ema: pybal: BGP MED configuration [puppet] - 10https://gerrit.wikimedia.org/r/380516 (https://phabricator.wikimedia.org/T165584) [14:25:37] (03PS2) 10Ema: pybal: BGP MED configuration [puppet] - 10https://gerrit.wikimedia.org/r/380516 (https://phabricator.wikimedia.org/T165584) [14:26:01] (03PS2) 10Eevans: cassandra: Follow links (copy the link destination) [puppet] - 10https://gerrit.wikimedia.org/r/380515 (https://phabricator.wikimedia.org/T171772) [14:26:53] (03PS1) 10Gehel: logstash: fix index name for apifeatureusage [puppet] - 10https://gerrit.wikimedia.org/r/380517 (https://phabricator.wikimedia.org/T176430) [14:27:40] (03PS3) 10Ema: pybal: BGP MED configuration [puppet] - 10https://gerrit.wikimedia.org/r/380516 (https://phabricator.wikimedia.org/T165584) [14:28:12] (03CR) 10Filippo Giunchedi: [C: 032] cassandra: Follow links (copy the link destination) [puppet] - 10https://gerrit.wikimedia.org/r/380515 (https://phabricator.wikimedia.org/T171772) (owner: 10Eevans) [14:29:13] (03CR) 10DCausse: [C: 031] logstash: fix index name for apifeatureusage [puppet] - 10https://gerrit.wikimedia.org/r/380517 (https://phabricator.wikimedia.org/T176430) (owner: 10Gehel) [14:29:42] 10Operations, 10MobileFrontend, 10Release-Engineering-Team, 10Readers-Web-Backlog (Tracking): Diff page produces 503 on first visit - https://phabricator.wikimedia.org/T176637#3632060 (10Jdlrobson) [14:30:45] (03PS2) 10Gehel: logstash: fix index name for apifeatureusage [puppet] - 10https://gerrit.wikimedia.org/r/380517 (https://phabricator.wikimedia.org/T176430) [14:31:18] (03CR) 10Gehel: [C: 032] logstash: fix index name for apifeatureusage [puppet] - 10https://gerrit.wikimedia.org/r/380517 (https://phabricator.wikimedia.org/T176430) (owner: 10Gehel) [14:31:49] !log rolling restart of logstash for config change - T176430 [14:31:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:31:54] T176430: api feature logs should be sent to both eqiad and codfw clusters - https://phabricator.wikimedia.org/T176430 [14:31:59] 10Operations, 10MobileFrontend, 10Release-Engineering-Team, 10Readers-Web-Backlog (Tracking): Diff page produces 503 on first visit - https://phabricator.wikimedia.org/T176637#3632079 (10Jdlrobson) [14:36:53] (03CR) 10Anomie: logstash: api feature logs should be sent to both eqiad and codfw clusters (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/380507 (https://phabricator.wikimedia.org/T176430) (owner: 10Gehel) [14:37:01] (03CR) 10Elukey: [C: 032] profile::kafka::broker_prometheus_exp: update delayed op metric [puppet] - 10https://gerrit.wikimedia.org/r/380509 (https://phabricator.wikimedia.org/T175922) (owner: 10Elukey) [14:37:07] (03PS3) 10Elukey: profile::kafka::broker_prometheus_exp: update delayed op metric [puppet] - 10https://gerrit.wikimedia.org/r/380509 (https://phabricator.wikimedia.org/T175922) [14:37:39] (03CR) 10Gehel: "@Anomie: Thanks for the comments! Both are fixed in https://gerrit.wikimedia.org/r/#/c/380517/" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/380507 (https://phabricator.wikimedia.org/T176430) (owner: 10Gehel) [14:38:24] ottomata: re: T175922 and the analytics instance, the idea is to have different prometheus instances roughly by team, also for isolation so that many metrics and/or heavy queries do not influence different instances, I proposed the analytics one since I'm assuming for e.g. hadoop or druid the metrics would be at that instance [14:38:24] T175922: Use Prometheus for Kafka JMX metrics instead of jmxtrans - https://phabricator.wikimedia.org/T175922 [14:38:48] PROBLEM - puppet last run on mw1185 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [14:39:02] godog: that sounds good [14:39:14] i thought we were doing it for kafka jumbo [14:39:17] which we shouldn't i think [14:39:22] !log dropping badly named apifeature index from cirrus elasticsearch codfw/eqiad - T176430 [14:39:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:39:28] T176430: api feature logs should be sent to both eqiad and codfw clusters - https://phabricator.wikimedia.org/T176430 [14:44:34] ottomata: because of the extended purpose of kafka-jumbo ? [14:44:42] ya, it is not an analytics box [14:44:53] Luca and I just happen to be the kafka people [14:45:01] analytics cluster* [14:51:48] indeed, ok I think to get started we should go with analytics and see how far that takes us [14:57:01] godog: we should put kafka jumbo in analytics prometheus instance? [14:57:50] ottomata: yeah [14:58:08] pwwwww do we have to? [14:58:32] godog: maybe it'd be better to do like services and not name clusters after teams? [14:58:37] e.g. sca, scb, etc. [14:58:38] ? [15:00:03] ottomata: heh that brings up an interesting point, since 'services' is also a team name, in the context of 'prometheus instances', albeit way too generic imho [15:01:12] ottomata: but yeah I don't have a good answer in this case, I'll make sure though we get to talk about it, especially within our goal next quarter [15:02:10] godog: if we ahve to we have to, but i'd really really rather not associtate kafka with analytics [15:02:32] when we port the main kafka clusters eventually, where should those go? [15:04:55] (03PS1) 10Gehel: elasticsearch: only wait 5 minutes for all nodes in case of cold restart [puppet] - 10https://gerrit.wikimedia.org/r/380524 (https://phabricator.wikimedia.org/T176409) [15:05:20] (03PS1) 10Muehlenhoff: Update expiry date for lmixter [puppet] - 10https://gerrit.wikimedia.org/r/380525 [15:05:56] (03CR) 10Muehlenhoff: [C: 032] Update expiry date for lmixter [puppet] - 10https://gerrit.wikimedia.org/r/380525 (owner: 10Muehlenhoff) [15:06:22] ottomata: I'd assumed in the analytics cluster, but if you'd like to avoid associating kafka with analytics, with what should be kafka associated with? [15:06:58] RECOVERY - puppet last run on mw1185 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [15:07:16] i dunno, what should redis be associated with? [15:08:54] (sorry, don't mean to be snarky) [15:09:30] or what would graphite or 'mysql' be associated with?   [15:09:46] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: db2047 got rebooted - https://phabricator.wikimedia.org/T176573#3632248 (10Papaul) The temperature in C6 is about 106 F and the ILO log shows Critical Temperature Threshold Exceeded on 9-24 at 11:30 pm (see attachment) {F9800902} [15:09:51] kafka is a general purpose message transport for many services and systems [15:10:13] no worries, I didn't read it that way, I don't have a good answer ATM to your question tho [15:11:03] godog: for a little more context: https://www.confluent.io/blog/stream-data-platform-1/ (if you are interested) :) [15:12:29] ottomata: thanks, heh yeah in that view then would be 'ops' I'd say, since it is effectively a shared resposability, so is redis perhaps [15:12:41] aye :) [15:13:49] other q godog, elukey and I are debating camel vs lower cased [15:13:54] for metric names [15:14:02] ido you have a preference? [15:14:35] e.g. [15:14:36] kafka_server_delayedoperationpurgatory_purgatorysize [15:14:37] vs [15:14:51] kafka_server_DelayedOperationPurgatory_NumDelayedOperations [15:16:04] ottomata: I've seen mostly lowercase but I'd say whichever makes the metric easier to read, I'm checking whether or not say regexp matching is case sensitive [15:18:04] 10Operations, 10ops-codfw, 10fundraising-tech-ops, 10hardware-requests, 10netops: unrack/decom pfw1-codfw and pfw2-codfw - https://phabricator.wikimedia.org/T176427#3632297 (10Papaul) [15:18:10] 10Operations, 10ops-codfw, 10fundraising-tech-ops, 10hardware-requests, 10netops: unrack/decom pfw1-codfw and pfw2-codfw - https://phabricator.wikimedia.org/T176427#3624959 (10Papaul) pfw1-codfw port 33 pfw2-codfw port 34 [15:19:30] ottomata: so yeah the convention seems to be all lowercase, the names can be changed at display time e.g. in grafana tho [15:20:56] (03CR) 10Jforrester: Enable RemexHTML on several wikis. (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379966 (https://phabricator.wikimedia.org/T175971) (owner: 10Zoranzoki21) [15:20:59] (03CR) 10Jforrester: [C: 04-1] Enable RemexHTML on several wikis. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379966 (https://phabricator.wikimedia.org/T175971) (owner: 10Zoranzoki21) [15:24:49] RECOVERY - MariaDB Slave Lag: s3 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 219.80 seconds [15:27:07] in grafana? [15:27:20] godog: i find the all lower case really hard to read [15:27:26] if they had underscores, i'd be fore it [15:27:35] but smooshing all lowercase words together is really hard to read [15:28:26] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: db2047 got rebooted - https://phabricator.wikimedia.org/T176573#3632327 (10Marostegui) >>! In T176573#3632248, @Papaul wrote: > The temperature in C6 is about 106 F and the ILO log shows Critical Temperature Threshold Exceeded on 9-24 at 11:30 pm (see a... [15:32:17] ottomata: in grafana yeah, I'm assuming the metrics would need to have their names changed at graph time anyway to e.g. remove the prefixes and so on [15:33:07] ottomata: all lowercase looks like to be a convention only though, it'd be nice to be consistent though I think [15:38:27] 10Operations, 10ops-codfw, 10fundraising-tech-ops, 10hardware-requests, 10netops: unrack/decom pfw1-codfw and pfw2-codfw - https://phabricator.wikimedia.org/T176427#3632362 (10Papaul) [15:40:01] 10Operations, 10ops-codfw, 10fundraising-tech-ops, 10hardware-requests, 10netops: unrack/decom pfw1-codfw and pfw2-codfw - https://phabricator.wikimedia.org/T176427#3624959 (10Papaul) a:05Papaul>03RobH complete [15:41:10] (03CR) 10Alexandros Kosiaris: "@volans, comments addressed, will upload new change shortly" (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/374560 (https://phabricator.wikimedia.org/T169516) (owner: 10Jcrespo) [15:42:08] (03CR) 10Alexandros Kosiaris: "@jcrespo, making it work is the easy one. Fixing the details is the difficult one :P. Anyway, got any preference on where I should start t" [puppet] - 10https://gerrit.wikimedia.org/r/374560 (https://phabricator.wikimedia.org/T169516) (owner: 10Jcrespo) [15:44:02] (03PS6) 10Zoranzoki21: Enable RemexHTML on several wikis. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379966 (https://phabricator.wikimedia.org/T175971) [15:44:20] (03PS7) 10Zoranzoki21: Enable RemexHTML on several wikis. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379966 (https://phabricator.wikimedia.org/T175971) [15:44:48] (03PS5) 10Alexandros Kosiaris: mariadb: Implement regular logical backups using mydumper [puppet] - 10https://gerrit.wikimedia.org/r/374560 (https://phabricator.wikimedia.org/T169516) (owner: 10Jcrespo) [15:44:53] (03CR) 10Zoranzoki21: "@Jforrester foundationwiki removed" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379966 (https://phabricator.wikimedia.org/T175971) (owner: 10Zoranzoki21) [15:45:03] godog it might be nice to be consistent like that if we had control over the metric names in prometheus [15:45:15] but in this case we don't as the JVM and other things create the names [15:45:38] (03PS8) 10Zoranzoki21: Enable RemexHTML on several wikis. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379966 (https://phabricator.wikimedia.org/T175971) [15:49:23] (03Abandoned) 10Thcipriani: WIP: scap: Add a scap::master profile [puppet] - 10https://gerrit.wikimedia.org/r/351179 (owner: 10Thcipriani) [15:51:13] (03PS7) 10Zoranzoki21: Access for Slaporte (Stephen LaPorte) to stat1005 [puppet] - 10https://gerrit.wikimedia.org/r/379851 (https://phabricator.wikimedia.org/T176518) [15:52:27] ottomata: yeah I meant consistent within our usage of jmx_exporter, what would be inconvienent imho is to have some jmx_exporter using lowercase and some not [15:53:00] (03CR) 10Subramanya Sastry: "+2 requires resolution of https://phabricator.wikimedia.org/T176150#3629621." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379966 (https://phabricator.wikimedia.org/T175971) (owner: 10Zoranzoki21) [15:53:47] (03CR) 10Zoranzoki21: "Ok" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379966 (https://phabricator.wikimedia.org/T175971) (owner: 10Zoranzoki21) [15:57:34] !log T169940: Converting restbase-ng data tables to size-tiered compaction [15:57:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:57:41] T169940: End of September milestone: Start migration of production use cases. - https://phabricator.wikimedia.org/T169940 [16:00:56] 10Operations, 10Ops-Access-Requests: Requesting access to production bastions for cwdent - https://phabricator.wikimedia.org/T176529#3632423 (10Dzahn) a:03Dzahn [16:01:29] ah ok godog that makes sense [16:01:41] (03CR) 10Jcrespo: "This cannot be tested, not in the current state- most shard are missing. we use dbstore2001. Manuel will fill you in better with the detai" [puppet] - 10https://gerrit.wikimedia.org/r/374560 (https://phabricator.wikimedia.org/T169516) (owner: 10Jcrespo) [16:02:43] i argue for uppercases [16:02:44] :) [16:02:54] since they also then match the original mbean metric names [16:03:02] and imo are easier to read [16:04:45] ottomata: ack, I'm also in ops meeting, maybe let's continue afterwards [16:05:13] k [16:05:15] i'm there too :) [16:06:53] (03PS15) 10Zoranzoki21: Cloud VPS configuration for hi.wikivoyage [puppet] - 10https://gerrit.wikimedia.org/r/371096 (https://phabricator.wikimedia.org/T173013) (owner: 10MarcoAurelio) [16:06:56] hehe yeah, realized I meant too but wrote also [16:07:04] (03CR) 10Zoranzoki21: [C: 031] Cloud VPS configuration for hi.wikivoyage [puppet] - 10https://gerrit.wikimedia.org/r/371096 (https://phabricator.wikimedia.org/T173013) (owner: 10MarcoAurelio) [16:07:14] (03CR) 10Ayounsi: [C: 031] "Note that eqiad's routers peer with the LVS' public IPs, so this will have to change before we restrict which port pybal listen on." [puppet] - 10https://gerrit.wikimedia.org/r/380508 (https://phabricator.wikimedia.org/T103882) (owner: 10Ema) [16:07:58] 10Operations, 10Operations-Software-Development, 10Goal, 10Patch-For-Review, 10Technical-Debt: Sunset our use of Salt - https://phabricator.wikimedia.org/T164780#3632433 (10Dzahn) [16:08:15] (03CR) 10Zoranzoki21: [C: 031] "Sorry for removing +2 with rebasing" [puppet] - 10https://gerrit.wikimedia.org/r/371096 (https://phabricator.wikimedia.org/T173013) (owner: 10MarcoAurelio) [16:09:25] (03CR) 10Ayounsi: [C: 031] pybal: BGP MED configuration [puppet] - 10https://gerrit.wikimedia.org/r/380516 (https://phabricator.wikimedia.org/T165584) (owner: 10Ema) [16:09:30] (03CR) 10Zoranzoki21: [C: 031] Add config for amwikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/378400 (https://phabricator.wikimedia.org/T176042) (owner: 10Ladsgroup) [16:09:36] (03CR) 10Marostegui: [C: 04-1] "-1 this because it has been done already :-)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379757 (owner: 10Jcrespo) [16:09:39] 10Operations, 10Operations-Software-Development, 10Goal, 10Patch-For-Review, 10Technical-Debt: Sunset our use of Salt - https://phabricator.wikimedia.org/T164780#3245193 (10Dzahn) Saw Icinga alerts for salt-minions not running (on argon and chlorine) "PROCS CRITICAL: 0 processes with regex args '^/usr/b... [16:11:28] (03PS2) 10Zoranzoki21: admin: Add gjg to contint-admin [puppet] - 10https://gerrit.wikimedia.org/r/379932 (owner: 10Greg Grossmeier) [16:11:49] ACKNOWLEDGEMENT - salt-minion processes on argon is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion daniel_zahn https://phabricator.wikimedia.org/T164780 [16:11:49] ACKNOWLEDGEMENT - salt-minion processes on chlorine is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion daniel_zahn https://phabricator.wikimedia.org/T164780 [16:13:10] ACKNOWLEDGEMENT - puppet last run on gerrit2001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 14 minutes ago with 1 failures. Failed resources (up to 3 shown): Service[gerrit] daniel_zahn https://phabricator.wikimedia.org/T176532 [16:19:49] PROBLEM - puppet last run on mw1221 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:21:46] 10Operations, 10ops-eqiad, 10DBA, 10Patch-For-Review: Decommission db1023 - https://phabricator.wikimedia.org/T166486#3297505 (10Dzahn) @cmjohnson db1022 and db1023 and their mgmt interfaces are alerting as down in Icinga. Did you shut them down? Looks like they still need to be removed from puppet. [16:22:59] ACKNOWLEDGEMENT - Host db1022.mgmt is DOWN: PING CRITICAL - Packet loss = 100% daniel_zahn https://phabricator.wikimedia.org/T166486 [16:22:59] ACKNOWLEDGEMENT - Host db1023.mgmt is DOWN: PING CRITICAL - Packet loss = 100% daniel_zahn https://phabricator.wikimedia.org/T166486 [16:23:05] (03PS1) 10Alexandros Kosiaris: Remove old boron reference from netboot.cfg [puppet] - 10https://gerrit.wikimedia.org/r/380545 [16:23:34] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] Remove old boron reference from netboot.cfg [puppet] - 10https://gerrit.wikimedia.org/r/380545 (owner: 10Alexandros Kosiaris) [16:24:56] 10Operations, 10ops-eqiad, 10DBA, 10Patch-For-Review: Decommission db1023 - https://phabricator.wikimedia.org/T166486#3632505 (10Cmjohnson) @dzahn yes they're down... i think these pre-dated the mgmt interface icinga check [16:25:54] 10Operations, 10ops-eqiad, 10DBA, 10Patch-For-Review: Decommission db1023 - https://phabricator.wikimedia.org/T166486#3632506 (10Dzahn) @Cmjohnson Ah, thanks! It's both, the servers and mgmt though. https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?host=all&style=hostdetail&hoststatustypes=4&hostprops... [16:34:58] RECOVERY - DPKG on sarin is OK: All packages OK [16:37:04] !log demon@tin Synchronized php-1.30.0-wmf.19/maintenance/getConfiguration.php: getConfiguration: Don't bail when a valid variable is set null (duration: 00m 47s) [16:37:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:37:14] ebernhardson: ^^^ [16:38:35] (03CR) 1020after4: [C: 031] "I think this is a good idea? I'd like thcipriani to weigh in though" [puppet] - 10https://gerrit.wikimedia.org/r/380503 (https://phabricator.wikimedia.org/T172333) (owner: 10Alexandros Kosiaris) [16:39:32] (03CR) 1020after4: [C: 031] Scap3: Go ahead and `scap deploy --init` a freshly provisioned repo [puppet] - 10https://gerrit.wikimedia.org/r/377304 (owner: 10Chad) [16:48:12] 10Operations, 10Continuous-Integration-Infrastructure, 10DNS, 10Traffic, and 2 others: CI: operations-dns-lint broken due to missing Maxmind DB file - https://phabricator.wikimedia.org/T175864#3632579 (10greg) a:05hashar>03None Just waiting on puppet merge, unassigning antoine. [16:48:59] RECOVERY - puppet last run on mw1221 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [16:52:07] 10Operations, 10Contributors-Team, 10MobileFrontend, 10MW-1.31-release-notes (WMF-deploy-2017-09-26 (1.31.0-wmf.1)), and 2 others: Diff page produces 503 on first visit - https://phabricator.wikimedia.org/T176637#3632602 (10greg) Per Dev/Maintainers adding the Contributors team as maintainers of the diff v... [16:52:24] 10Operations, 10Continuous-Integration-Infrastructure, 10DNS, 10Traffic, and 2 others: CI: operations-dns-lint broken due to missing Maxmind DB file - https://phabricator.wikimedia.org/T175864#3632606 (10Dzahn) I would have merged https://gerrit.wikimedia.org/r/#/c/377986/ but it was blocked by a dependen... [16:52:35] 10Operations, 10Gerrit, 10Release-Engineering-Team (Next): Gerrit is failing to start gerrit-ssh on gerrit2001 - https://phabricator.wikimedia.org/T176532#3632607 (10greg) [16:54:30] (03PS2) 10Mobrovac: Revert "[Logging config] Enable logging for updateBetaFeaturesUserCounts" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/378856 (owner: 10Ppchelko) [16:56:41] (03PS3) 10RobH: admin: Add gjg to contint-admin [puppet] - 10https://gerrit.wikimedia.org/r/379932 (owner: 10Greg Grossmeier) [16:57:26] (03CR) 10Dzahn: [C: 031] "approved in ops meeting" [puppet] - 10https://gerrit.wikimedia.org/r/379932 (owner: 10Greg Grossmeier) [16:57:33] (03CR) 10RobH: [C: 032] admin: Add gjg to contint-admin [puppet] - 10https://gerrit.wikimedia.org/r/379932 (owner: 10Greg Grossmeier) [16:57:44] (03CR) 10RobH: [C: 032] "this was approved in the ops meeting" [puppet] - 10https://gerrit.wikimedia.org/r/379932 (owner: 10Greg Grossmeier) [16:58:16] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: db2047 got rebooted - https://phabricator.wikimedia.org/T176573#3632627 (10Marostegui) Looking at the PDU temperature graphs, I cannot see anything weird there, so it might have been just a punctual thing with this host. [16:59:22] (03CR) 10Jdlrobson: "needs rebase" [puppet] - 10https://gerrit.wikimedia.org/r/379829 (https://phabricator.wikimedia.org/T175395) (owner: 10Bmansurov) [16:59:46] (03CR) 10Mobrovac: [V: 032 C: 032] Revert "[Logging config] Enable logging for updateBetaFeaturesUserCounts" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/378856 (owner: 10Ppchelko) [16:59:58] (03CR) 10jenkins-bot: Revert "[Logging config] Enable logging for updateBetaFeaturesUserCounts" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/378856 (owner: 10Ppchelko) [17:00:05] gehel: How many deployers does it take to do Wikidata Query Service weekly deploy deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170925T1700). [17:00:05] Smalyshev: A patch you scheduled for Wikidata Query Service weekly deploy is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [17:00:43] jouncebot: o/ [17:02:03] ottomata: re: jmx_exporter lowercase vs keep case I don't feel strongly about it but we should set on a convention now IMO (cc urandom since we're using jmx_exporter there too) [17:02:12] !log mobrovac@tin Synchronized wmf-config/InitialiseSettings.php: Disable the updateBetaFeaturesUserCounts logging channel - T175637 (duration: 00m 46s) [17:02:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:02:18] T175637: End of September milestone: Migrate first production use case - https://phabricator.wikimedia.org/T175637 [17:02:26] akosiaris: is "testing serviceAccounts" on kubernetes hosts still ongoing ? [17:02:46] godog: do we have an agreement about a specific prometheus instance for kafka? [17:02:52] or is it still tbd? [17:03:05] 10Operations, 10ops-eqiad, 10DC-Ops: scs-c1-eqiad unresponsive - https://phabricator.wikimedia.org/T175625#3598199 (10RobH) [17:03:35] (03PS3) 10Bmansurov: Implement Schema:Print purging strategy [puppet] - 10https://gerrit.wikimedia.org/r/379829 (https://phabricator.wikimedia.org/T175395) [17:03:41] mutante: yes it is [17:03:58] elukey: heh I think it'll be the 'ops' instance if kafka is going to be a shared resposanbility anyway [17:04:15] mutante: I 've decided though to postpone it to next quarter [17:04:21] lemme fix the code [17:04:30] akosiaris: ok, so i just ask that because argon and chlorine stand out in Icinga, for salt-minions.. and that is just because puppet didnt run to remove it [17:05:03] ok, will fix now [17:05:06] thanks :) [17:05:26] godog: super, so whenever we are ok could we add the job to it? (maybe tomorrow morning, I can try to come up with a patch) [17:05:50] !log gehel@tin Started deploy [wdqs/wdqs@3eec185]: Deploy new Blazegraph binary & GUI update [17:05:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:05:57] (03PS1) 10Alexandros Kosiaris: Remove ServiceAccount from kubernetes master [puppet] - 10https://gerrit.wikimedia.org/r/380550 [17:06:15] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] Remove ServiceAccount from kubernetes master [puppet] - 10https://gerrit.wikimedia.org/r/380550 (owner: 10Alexandros Kosiaris) [17:07:40] !log gehel@tin Finished deploy [wdqs/wdqs@3eec185]: Deploy new Blazegraph binary & GUI update (duration: 01m 51s) [17:07:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:07:51] SMalyshev: ^ done, all tests are green... [17:08:17] gehel: great, thanks! [17:09:21] (03CR) 10Dzahn: "ran puppet on contint1001/2001. Admin::User[gjg]/User[gjg]/ensure: created" [puppet] - 10https://gerrit.wikimedia.org/r/379932 (owner: 10Greg Grossmeier) [17:09:35] elukey: yeah tomorrow morning is fine! [17:09:37] I'm off [17:09:56] mutante: done [17:10:04] akosiaris: :) thx [17:14:09] (03PS1) 10Jdlrobson: Disable RelatedArticles instrumentation on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/380554 (https://phabricator.wikimedia.org/T174944) [17:16:27] 10Operations, 10Edit-Review-Improvements, 10Collaboration-Team-Triage (Collab-Team-Q1-Jul-Sep-2017), 10Performance: Systematically test load speeds of Watchlist and Recent Changes - https://phabricator.wikimedia.org/T176445#3632705 (10jmatazzoni) a:03Catrope [17:23:55] (03CR) 10Addshore: "So, I would be PRO trying to do the R tests in docker." [puppet] - 10https://gerrit.wikimedia.org/r/363337 (https://phabricator.wikimedia.org/T153856) (owner: 10Hashar) [17:23:57] (03CR) 10Zoranzoki21: [C: 031] "Looks good for me.. I only waited jenkins to verify" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/380554 (https://phabricator.wikimedia.org/T174944) (owner: 10Jdlrobson) [17:27:20] (03CR) 10Bmansurov: [C: 04-1] Disable RelatedArticles instrumentation on all wikis (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/380554 (https://phabricator.wikimedia.org/T174944) (owner: 10Jdlrobson) [17:27:27] (03PS16) 10Zoranzoki21: Cloud VPS configuration for hi.wikivoyage [puppet] - 10https://gerrit.wikimedia.org/r/371096 (https://phabricator.wikimedia.org/T173013) (owner: 10MarcoAurelio) [17:27:59] (03CR) 10Zoranzoki21: [C: 031] "I again removed Cannot Merge.. And after again will be message Cannot Merge" [puppet] - 10https://gerrit.wikimedia.org/r/371096 (https://phabricator.wikimedia.org/T173013) (owner: 10MarcoAurelio) [17:32:28] bblack: able to make that meeting in 30 mins or not so much? [17:33:03] 10Operations, 10Operations-Software-Development, 10Goal, 10Patch-For-Review, 10Technical-Debt: Sunset our use of Salt - https://phabricator.wikimedia.org/T164780#3632793 (10Dzahn) [17:33:57] bblack: disregard, saw the follow up email [17:38:43] 10Operations, 10Operations-Software-Development, 10Goal, 10Patch-For-Review, 10Technical-Debt: Sunset our use of Salt - https://phabricator.wikimedia.org/T164780#3632807 (10Dzahn) Nevermind, it was already removed by Moritz. These alerts i saw were just still there because puppet was disabled on these ho... [17:52:06] 10Operations, 10Ops-Access-Requests: Requesting access to production bastions for cwdent - https://phabricator.wikimedia.org/T176529#3632824 (10Dzahn) We talked about this on IRC. Not a problem. It's not the first time cwdent gets shell access, it's just partially reverting his removal, (was removed in an audi... [17:52:24] (03CR) 10Zoranzoki21: [C: 031] "Looks good for me" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/378784 (https://phabricator.wikimedia.org/T176174) (owner: 10Greg Grossmeier) [17:53:11] 10Operations, 10Ops-Access-Requests: Requesting access to production bastions for cwdent - https://phabricator.wikimedia.org/T176529#3628771 (10Dzahn) Adding @herron as the "on duty" ops this week. [17:53:26] 10Operations, 10Ops-Access-Requests: Requesting access to production bastions for cwdent - https://phabricator.wikimedia.org/T176529#3632827 (10Dzahn) p:05Triage>03Normal [17:55:05] jouncebot: now [17:55:05] No deployments scheduled for the next 0 hour(s) and 4 minute(s) [17:55:09] jouncebot: next [17:55:10] In 0 hour(s) and 4 minute(s): Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170925T1800) [17:58:07] !log demon@tin Pruned MediaWiki: 1.30.0-wmf.15 (duration: 02m 46s) [17:58:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:00:04] addshore, hashar, anomie, RainbowSprinkles, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: Your horoscope predicts another unfortunate Morning SWAT (Max 8 patches) deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170925T1800). [18:00:04] No GERRIT patches in the queue for this window AFAICS. [18:00:28] horoscope prediction fail [18:00:57] (03Draft2) 10Zoranzoki21: Change Turkish Wiktionary logo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/380563 [18:01:15] (03PS1) 10Dzahn: admins: partially re-enable shell access for cwdent [puppet] - 10https://gerrit.wikimedia.org/r/380565 (https://phabricator.wikimedia.org/T176529) [18:01:31] (03PS3) 10Zoranzoki21: Change Turkish Wiktionary logo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/380563 (https://phabricator.wikimedia.org/T176008) [18:01:51] (03CR) 10EBernhardson: [C: 031] "Seems sane. Perhaps add a small section to https://wikitech.wikimedia.org/wiki/Search about cold-boot behaviour recording what we learned." [puppet] - 10https://gerrit.wikimedia.org/r/380524 (https://phabricator.wikimedia.org/T176409) (owner: 10Gehel) [18:02:50] (03CR) 10Krinkle: [C: 031] Enable asia-specific Navigation Timing metric [mediawiki-config] - 10https://gerrit.wikimedia.org/r/380441 (https://phabricator.wikimedia.org/T169522) (owner: 10Gilles) [18:08:42] (03PS1) 10Chad: Forgotten comma [mediawiki-config] - 10https://gerrit.wikimedia.org/r/380566 [18:08:45] (03PS8) 10Zoranzoki21: Access for Slaporte (Stephen LaPorte) to stat1005 [puppet] - 10https://gerrit.wikimedia.org/r/379851 (https://phabricator.wikimedia.org/T176518) [18:09:16] (03CR) 10Zoranzoki21: "Again Cannot merge message.. Sorry for spamming with rebasing" [puppet] - 10https://gerrit.wikimedia.org/r/379851 (https://phabricator.wikimedia.org/T176518) (owner: 10Zoranzoki21) [18:10:04] (03CR) 10Dzahn: [C: 04-1] "slaporte is already an LDAP-only admin (in LDAP with wmf group but no shell access) but he needs to be removed from that part of the file " [puppet] - 10https://gerrit.wikimedia.org/r/379851 (https://phabricator.wikimedia.org/T176518) (owner: 10Zoranzoki21) [18:10:21] (03CR) 10Thcipriani: [C: 031] Forgotten comma [mediawiki-config] - 10https://gerrit.wikimedia.org/r/380566 (owner: 10Chad) [18:11:31] (03CR) 10Dzahn: [C: 04-1] "there is no need to rebase and worry about "cannot merge" in this repo. it's normal behaviour here due to the "FastForward only" Strategy." [puppet] - 10https://gerrit.wikimedia.org/r/379851 (https://phabricator.wikimedia.org/T176518) (owner: 10Zoranzoki21) [18:13:02] (03CR) 10Zoranzoki21: "Ok.. Can you select on what you think to need be removed?" [puppet] - 10https://gerrit.wikimedia.org/r/379851 (https://phabricator.wikimedia.org/T176518) (owner: 10Zoranzoki21) [18:14:42] (03CR) 10Dzahn: [C: 04-1] Access for Slaporte (Stephen LaPorte) to stat1005 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/379851 (https://phabricator.wikimedia.org/T176518) (owner: 10Zoranzoki21) [18:14:53] (03CR) 10Dzahn: [C: 04-1] "i marked it in inline comments" [puppet] - 10https://gerrit.wikimedia.org/r/379851 (https://phabricator.wikimedia.org/T176518) (owner: 10Zoranzoki21) [18:16:17] (03PS9) 10Zoranzoki21: Access for Slaporte (Stephen LaPorte) to stat1005 [puppet] - 10https://gerrit.wikimedia.org/r/379851 (https://phabricator.wikimedia.org/T176518) [18:16:37] (03CR) 10Zoranzoki21: "Ok.. Now is ok?" [puppet] - 10https://gerrit.wikimedia.org/r/379851 (https://phabricator.wikimedia.org/T176518) (owner: 10Zoranzoki21) [18:21:30] (03CR) 10Dzahn: [C: 031] "yea, this looks good now. the only nitpicks i have are: please start commit message with "admins: " and i am not sure about the capital P " [puppet] - 10https://gerrit.wikimedia.org/r/379851 (https://phabricator.wikimedia.org/T176518) (owner: 10Zoranzoki21) [18:21:56] (03CR) 10Chad: [C: 032] Forgotten comma [mediawiki-config] - 10https://gerrit.wikimedia.org/r/380566 (owner: 10Chad) [18:22:49] (03PS1) 10Sowjanyavemuri: Microtask for Outreachy(Round15) that describes the understanding of the webservice commands. webservice --backend kubernetes start webservice --backend kubernetes stop [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/380568 [18:23:27] (03CR) 10Zoranzoki21: "Super. Thank you very much.. No problem for rule 3 days waiting" [puppet] - 10https://gerrit.wikimedia.org/r/379851 (https://phabricator.wikimedia.org/T176518) (owner: 10Zoranzoki21) [18:24:35] (03Merged) 10jenkins-bot: Forgotten comma [mediawiki-config] - 10https://gerrit.wikimedia.org/r/380566 (owner: 10Chad) [18:25:39] 10Operations, 10Ops-Access-Requests, 10Patch-For-Review: Requesting access to stat1005 for Slaporte - https://phabricator.wikimedia.org/T176518#3632970 (10Zoranzoki21) p:05Triage>03Low a:03Zoranzoki21 [18:26:10] (03CR) 10jenkins-bot: Forgotten comma [mediawiki-config] - 10https://gerrit.wikimedia.org/r/380566 (owner: 10Chad) [18:26:14] (03CR) 10Zoranzoki21: [C: 031] admins: partially re-enable shell access for cwdent [puppet] - 10https://gerrit.wikimedia.org/r/380565 (https://phabricator.wikimedia.org/T176529) (owner: 10Dzahn) [18:29:09] (03CR) 10Faidon Liambotis: [C: 04-1] "This is old code, so when did it stop working and why?" [puppet] - 10https://gerrit.wikimedia.org/r/377986 (https://phabricator.wikimedia.org/T175864) (owner: 10Hashar) [18:30:52] (03Abandoned) 10Sowjanyavemuri: Microtask for Outreachy(Round15) that describes the understanding of the webservice commands. webservice --backend kubernetes start webservice --backend kubernetes stop [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/380568 (owner: 10Sowjanyavemuri) [18:31:38] (03Restored) 10Sowjanyavemuri: Microtask for Outreachy(Round15) that describes the understanding of the webservice commands. webservice --backend kubernetes start webservice --backend kubernetes stop [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/380568 (owner: 10Sowjanyavemuri) [18:38:54] !log demon@tin Synchronized scap/plugins/clean.py: no-op/consistency (duration: 00m 45s) [18:38:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:41:37] (03PS2) 10Sowjanyavemuri: Microtask for Outreachy(Round15) that describes the understanding of the webservice commands. webservice --backend kubernetes start webservice --backend kubernetes stop [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/380568 [18:43:45] (03CR) 10Krinkle: "Aye, we'll need to arrange prod shell access for analytics then (). Which, btw " [puppet] - 10https://gerrit.wikimedia.org/r/375345 (https://phabricator.wikimedia.org/T104902) (owner: 10Phedenskog) [18:44:38] (03CR) 10Ottomata: "+1 in general, one nit" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/380449 (https://phabricator.wikimedia.org/T176223) (owner: 10Elukey) [18:51:13] 10Operations, 10Edit-Review-Improvements, 10Collaboration-Team-Triage (Collab-Team-Q1-Jul-Sep-2017), 10Performance: Systematically test load speeds of Watchlist and Recent Changes - https://phabricator.wikimedia.org/T176445#3633009 (10jmatazzoni) a:05Catrope>03Mattflaschen-WMF [18:53:23] (03CR) 10Krinkle: [C: 031] "Scheduled for Puppet SWAT next Tuesday, September 26." [puppet] - 10https://gerrit.wikimedia.org/r/353228 (https://phabricator.wikimedia.org/T107128) (owner: 10Tim Starling) [18:54:04] !log started populating ip_changes on group2 wikis [18:54:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:57:11] 10Operations, 10Gerrit, 10Release-Engineering-Team (Backlog): Reimage gerrit2001 as stretch - https://phabricator.wikimedia.org/T168562#3633014 (10Dzahn) a:03Dzahn [18:57:19] 10Operations, 10Gerrit, 10Release-Engineering-Team (Backlog): Reimage gerrit2001 as stretch - https://phabricator.wikimedia.org/T168562#3368352 (10Dzahn) p:05Lowest>03Normal [18:59:02] 10Operations, 10Gerrit, 10Release-Engineering-Team (Next): Gerrit is failing to start gerrit-ssh on gerrit2001 - https://phabricator.wikimedia.org/T176532#3633018 (10Dzahn) We'll do T168562 and reinstall this box with stretch to, ideally, kill 2 birds with one stone. Confirm if this issue goes away and see i... [19:02:26] !log robh@puppetmaster1001 conftool action : set/pooled=no; selector: name=cp4021.ulsfo.wmnet [19:02:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:02:43] !log cp4021 coming down for memory swap [19:02:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:03:00] no matter what i do its gonna complain since every cp system connects to it for an icinga check [19:03:08] even if i maint mode the actual cp4021 node ;] [19:05:06] (03PS4) 10Dzahn: Gerrit: Remove gc logging [puppet] - 10https://gerrit.wikimedia.org/r/379946 (owner: 10Paladox) [19:06:11] (03CR) 10Dzahn: [C: 032] Gerrit: Remove gc logging [puppet] - 10https://gerrit.wikimedia.org/r/379946 (owner: 10Paladox) [19:06:20] thanks [19:07:10] (03PS1) 10Faidon Liambotis: interface::offload: remove hardcoded default eth0 [puppet] - 10https://gerrit.wikimedia.org/r/380574 [19:07:41] (03CR) 10Faidon Liambotis: [C: 032] interface::offload: remove hardcoded default eth0 [puppet] - 10https://gerrit.wikimedia.org/r/380574 (owner: 10Faidon Liambotis) [19:07:48] (03PS2) 10Faidon Liambotis: interface::offload: remove hardcoded default eth0 [puppet] - 10https://gerrit.wikimedia.org/r/380574 [19:11:44] (03PS10) 10Dzahn: icinga: initial whitelist for screen monitoring [puppet] - 10https://gerrit.wikimedia.org/r/377823 (https://phabricator.wikimedia.org/T165348) [19:12:04] PROBLEM - Host cp4021 is DOWN: PING CRITICAL - Packet loss = 100% [19:12:36] (03CR) 10Dzahn: [C: 032] "thank you Alex, Jaime, Filippo. If something is missing i will follow-up. Going ahead with this one for now. That's why it was just "initi" [puppet] - 10https://gerrit.wikimedia.org/r/377823 (https://phabricator.wikimedia.org/T165348) (owner: 10Dzahn) [19:13:44] PROBLEM - Host cp4021.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [19:15:39] (03CR) 10Ottomata: "Actually, on second thought, shouldn't we merge profile::druid::common and profile::druid::worker? I can't think of a time when they'd be" [puppet] - 10https://gerrit.wikimedia.org/r/380449 (https://phabricator.wikimedia.org/T176223) (owner: 10Elukey) [19:18:44] (03CR) 10Ottomata: "I was trying to sync up the LVS patch with this. Since we're going to have multiple druid clusters, we're going to need a druid_cluster_n" [puppet] - 10https://gerrit.wikimedia.org/r/380449 (https://phabricator.wikimedia.org/T176223) (owner: 10Elukey) [19:18:54] RECOVERY - Host cp4021.mgmt is UP: PING OK - Packet loss = 0%, RTA = 79.16 ms [19:20:09] 10Operations, 10Analytics-Cluster, 10Analytics-Kanban, 10User-Elukey: thorium - failed git clone of geowiki-data-private - https://phabricator.wikimedia.org/T171923#3633057 (10Ottomata) Can/should we take this out of Kanban? [19:21:34] (03PS1) 10Dzahn: WMCS: fix access for wmcs-admin to labsdb [puppet] - 10https://gerrit.wikimedia.org/r/380575 [19:22:41] (03CR) 10Dzahn: "well, i should not have removed those 2 extra lines in file labs_deprecated.yaml, that happened in rebase. i noticed on merge.. fixing it" [puppet] - 10https://gerrit.wikimedia.org/r/377823 (https://phabricator.wikimedia.org/T165348) (owner: 10Dzahn) [19:23:34] RECOVERY - Host cp4021 is UP: PING OK - Packet loss = 0%, RTA = 78.83 ms [19:23:51] (03PS17) 10MarcoAurelio: Cloud VPS configuration for hi.wikivoyage [puppet] - 10https://gerrit.wikimedia.org/r/371096 (https://phabricator.wikimedia.org/T173013) [19:23:59] (03CR) 10Dzahn: [C: 032] WMCS: fix access for wmcs-admin to labsdb [puppet] - 10https://gerrit.wikimedia.org/r/380575 (owner: 10Dzahn) [19:25:08] (03CR) 10MarcoAurelio: "Please leave this patch as is. There's a lot of changes to the puppet all day long, so it's not strange that there are merge conflicts in " [puppet] - 10https://gerrit.wikimedia.org/r/371096 (https://phabricator.wikimedia.org/T173013) (owner: 10MarcoAurelio) [19:25:14] !log robh@puppetmaster1001 conftool action : set/pooled=yes; selector: name=cp4021.ulsfo.wmnet [19:25:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:30:38] gwicke: may I please delete the 'docker-testing' VM? Its drive has been full for quite some time and it can't possibly be doing anything useful... [19:30:40] (03CR) 10Dzahn: [C: 031] "lgtm. re: rebasing: yea, no need to, see my comment on https://gerrit.wikimedia.org/r/#/c/379851/ the same applies here as well. it's bec" [puppet] - 10https://gerrit.wikimedia.org/r/371096 (https://phabricator.wikimedia.org/T173013) (owner: 10MarcoAurelio) [19:30:56] (03Abandoned) 10Jcrespo: Pool db1055 with full weight, remove main traffic from rc replicas [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379757 (owner: 10Jcrespo) [19:31:52] (Could someone within poking distance of gwicke poke him? I've asked him this on irc a few times and also emailed, so far no response.) [19:31:53] (03CR) 10Jcrespo: "Remember to remove all temporay comments, I think "low load" db1055 is still on HEAD? I could be wrong." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379757 (owner: 10Jcrespo) [19:32:10] andrewbogott: if you dont mind, this would be a labs DNS addition, new project hi.wikivoyage https://gerrit.wikimedia.org/r/#/c/371096/ [19:33:00] (03PS18) 10Andrew Bogott: Cloud VPS configuration for hi.wikivoyage [puppet] - 10https://gerrit.wikimedia.org/r/371096 (https://phabricator.wikimedia.org/T173013) (owner: 10MarcoAurelio) [19:33:56] :) thanks [19:34:07] thanks, as owner of the patch as well [19:34:10] abartov: I can't reach openocr.openocr.eqiad.wmflabs. Can you? [19:34:11] 10Operations, 10ops-ulsfo, 10Traffic: cp4021 memory hardware issue - DIMM B1 - https://phabricator.wikimedia.org/T175585#3633094 (10RobH) {F9804322} Replaced the bad memory dimm, will drop off shipment in usps mailbox. [19:34:18] 10Operations, 10ops-ulsfo, 10Traffic: cp4021 memory hardware issue - DIMM B1 - https://phabricator.wikimedia.org/T175585#3633095 (10RobH) 05Open>03Resolved a:03RobH [19:34:19] (03CR) 10Andrew Bogott: [C: 032] Cloud VPS configuration for hi.wikivoyage [puppet] - 10https://gerrit.wikimedia.org/r/371096 (https://phabricator.wikimedia.org/T173013) (owner: 10MarcoAurelio) [19:36:41] (03PS1) 10Volans: Cumin: add alias for WMCS [puppet] - 10https://gerrit.wikimedia.org/r/380577 [19:37:26] jynus: / marostegui can you check if there are lag, etc. on hosts relating to hi.wikivoyage? Wikidata links ain't working. [19:37:50] s/ain't working/ain't being displayed despite having been added in the database [19:38:33] addshore: you work for Wikidata? Anything you can do? ^ [19:38:52] 10Operations, 10ops-ulsfo, 10netops: connect new office link to asw-ulsfo - https://phabricator.wikimedia.org/T176350#3633114 (10RobH) All onsite work has been done up to the point of UnitedLayer connecting when the new port is attached to the panel. An email was sent to UL today: > Carlos, > > As we di... [19:39:09] (03CR) 10Volans: [C: 032] Cumin: add alias for WMCS [puppet] - 10https://gerrit.wikimedia.org/r/380577 (owner: 10Volans) [19:41:55] tabbycat, what do you mean with "wikidata links"? [19:42:12] do you mean production? [19:42:13] jynus: interwikis on the pages ain't displayed [19:42:20] despite being added to Wikidata [19:42:32] that must be a deployment problem/configuration [19:42:46] if there are any lag that might be -or- yep, a deployment issue [19:42:50] if hewikivoyage was delayer, other 8000 wikis would be too [19:42:55] *800 [19:42:57] hi* [19:43:19] cannot have lag on a single wiki :-), except enwiki or commons [19:44:00] (03CR) 10Hashar: "I don't know why it stopped working, but role::puppetmaster::standalone no more ship the GeoIP datafiles on the puppet master." [puppet] - 10https://gerrit.wikimedia.org/r/377986 (https://phabricator.wikimedia.org/T175864) (owner: 10Hashar) [19:44:07] I know sometimes people have to do some tweaking after deployment [19:44:20] I'll reopen the task [19:44:36] but I cannot help there, it is a deployment issue [19:44:45] (assuming that) [19:55:44] 10Operations, 10Analytics, 10Analytics-Cluster, 10Research-management: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843#3633236 (10dr0ptp4kt) Following up on this, I arranged some time with @Ottomata to take a look into this. [19:58:14] !log renumbering ams BGP communities - T167840 [19:58:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:58:21] T167840: Merge AS14907 with AS43821 - https://phabricator.wikimedia.org/T167840 [19:59:33] (03PS9) 10Zoranzoki21: Enable RemexHTML on several wikis. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379966 (https://phabricator.wikimedia.org/T175971) [20:00:04] gwicke, cscott, arlolra, subbu, bearND, halfak, and Amir1: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for Services – Parsoid / OCG / Citoid / Mobileapps / ORES / … deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170925T2000). [20:00:04] No GERRIT patches in the queue for this window AFAICS. [20:00:25] jouncebot: nothing for ORES today [20:01:02] (03CR) 10Zoranzoki21: "Sorry for spamming with rebasing, but I know what is problem.. Everyday much times InitaliseSettings.php is being modified because it is o" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379966 (https://phabricator.wikimedia.org/T175971) (owner: 10Zoranzoki21) [20:01:04] arlo will be deploying parsoid today [20:01:33] no mobileapps deploy today [20:13:39] !log arlolra@tin Started deploy [parsoid/deploy@376faad]: Updating Parsoid to 4230c27b [20:13:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:14:09] herron: just wanted to bring this to your attention i wasnt quite sure whether or not to tag this with ops T176666 [20:14:10] T176666: Qualtrics email-LDAP issue - https://phabricator.wikimedia.org/T176666 [20:30:32] !log arlolra@tin Finished deploy [parsoid/deploy@376faad]: Updating Parsoid to 4230c27b (duration: 16m 52s) [20:30:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:36:26] (03CR) 10Krinkle: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/379830 (owner: 10Krinkle) [20:37:20] !log Updated Parsoid to 4230c27b (T176425, T173029) [20:37:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:37:26] T176425: visual editor inserts a reference tag - https://phabricator.wikimedia.org/T176425 [20:37:27] T173029: Parsoid configuration for hi.wikivoyage - https://phabricator.wikimedia.org/T173029 [20:42:42] (03CR) 10Chad: [C: 031] Make it easy to set PHP ini flags with mwscript [puppet] - 10https://gerrit.wikimedia.org/r/378007 (owner: 10Aaron Schulz) [20:43:21] (03PS2) 10MaxSem: Make it easy to set PHP ini flags with mwscript [puppet] - 10https://gerrit.wikimedia.org/r/378007 (owner: 10Aaron Schulz) [20:43:56] (03CR) 10MaxSem: [C: 031] Make it easy to set PHP ini flags with mwscript [puppet] - 10https://gerrit.wikimedia.org/r/378007 (owner: 10Aaron Schulz) [20:48:43] (03PS1) 10Hoo man: Set a reasonable --batch-size for Wikidata entity dumps [puppet] - 10https://gerrit.wikimedia.org/r/380628 [20:50:12] (03CR) 10Hoo man: "Also updates the minimal expected file size." [puppet] - 10https://gerrit.wikimedia.org/r/380628 (owner: 10Hoo man) [20:53:32] 10Operations, 10Performance-Team (Radar), 10User-Joe: Logic problem in puppet.git tests - https://phabricator.wikimedia.org/T176671#3633414 (10Krinkle) [20:53:38] 10Operations, 10Puppet, 10Performance-Team (Radar), 10User-Joe: Logic problem in puppet.git tests - https://phabricator.wikimedia.org/T176671#3633426 (10Krinkle) [20:53:57] https://www.productsandservices.bt.com/products/smart-hub/?s_cid=con_ppc_maxus_vidZ60_T1&vendorid=Z60&gclid=CjwKCAjw0qLOBRBUEiwAMG5xMMG81Y60a1kgSVfmgjR0UT2rGeXVEWsJ-yLgKRJj0T2fvOFJqe_sbxoC9MkQAvD_BwE&gclsrc=aw.ds&dclid=COvtno2cwdYCFYTjGwodfTEJJQ [20:53:58] woops [20:54:00] sorry [20:54:04] that's for someone else [20:57:29] !log Manually pruned the sites and site_identifiers table on hiwikivoyage, than ran populateSitesTable.php. (T173030) [20:57:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:57:35] T173030: Wikidata support for newly created hi.wikivoyage - https://phabricator.wikimedia.org/T173030 [20:57:53] (03CR) 10Krinkle: [C: 031] "Scheduled for Puppet SWAT tomorrow." [puppet] - 10https://gerrit.wikimedia.org/r/378007 (owner: 10Aaron Schulz) [20:58:26] Dereckson: https://phabricator.wikimedia.org/T173030#3633431 Please watch for errors when populating the sites table [20:58:32] This is a well known bug :/ [20:59:12] Please see https://wikitech.wikimedia.org/wiki/Add_a_wiki#Wikidata [20:59:34] (03PS9) 10Krinkle: [WIP] webperf: Add navtiming tests to puppet.git:/tox.ini [puppet] - 10https://gerrit.wikimedia.org/r/379830 [20:59:42] (03PS8) 10Krinkle: webperf: Fix crash when event contains browser_major:null [puppet] - 10https://gerrit.wikimedia.org/r/379820 (https://phabricator.wikimedia.org/T176149) [21:00:04] dapatrick, bawolff, and Reedy: My dear minions, it's time we take the moon! Just kidding. Time for Weekly Security deployment window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170925T2100). [21:00:04] No GERRIT patches in the queue for this window AFAICS. [21:00:34] hoo: noted [21:03:04] 10Operations, 10Puppet, 10Performance-Team (Radar), 10User-Joe: Logic problem in puppet.git tests - https://phabricator.wikimedia.org/T176671#3633449 (10Krinkle) I've solved the immediate problem with @joe over IRC. I didn't add `webperf` to the list of default tests to run via `test -r` (fixed: (03PS10) 10Krinkle: webperf: Add navtiming tests to puppet.git:/tox.ini [puppet] - 10https://gerrit.wikimedia.org/r/379830 [21:05:03] (03PS9) 10Krinkle: webperf: Fix crash when event contains browser_major:null [puppet] - 10https://gerrit.wikimedia.org/r/379820 (https://phabricator.wikimedia.org/T176149) [21:18:42] tabbycat: just saw the ping what's up? [21:19:15] addshore: Marius fixed the issue. It was the populate script failing once more [21:19:27] re-run and fixed [21:19:28] Ack :) [21:19:35] thanks nonetheless :) [21:40:14] (03CR) 10EBernhardson: "looks like 5.3.x plugins are now deployed everywhere, should be good to merge?" [software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/376477 (https://phabricator.wikimedia.org/T175159) (owner: 10DCausse) [21:43:11] (03PS1) 10Volans: OpenStack: limit grammar to not overlap the global one [software/cumin] - 10https://gerrit.wikimedia.org/r/380653 [21:57:13] 10Operations, 10Mail: Create affcom-staff email account - https://phabricator.wikimedia.org/T176153#3633617 (10Aklapper) Uh, I see - I'm sorry! [Please adjust associated projects in such cases, otherwise folks will not see this task on the their workboard.] [21:58:17] (03PS2) 10Volans: OpenStack: limit grammar to not overlap the global one [software/cumin] - 10https://gerrit.wikimedia.org/r/380653 [21:59:59] 10Operations, 10Mail, 10Surveys: Qualtrics email-LDAP issue - https://phabricator.wikimedia.org/T176666#3633620 (10Aklapper) [Please add corresponding projects to tasks, otherwise tasks will never appear on their workboards. Adding #mail + #operations as that's also the tags on T159750 ] [22:15:47] there's no way to just diable a mailing list (keeping archives) is there? the googles suggest emergency moderation, but that still spams me (the only remaining owner) [22:16:09] greg-g: there's a script to disable a mailing list [22:16:22] oh? [22:16:32] disable, not delete, right? [22:16:45] greg-g: https://wikitech.wikimedia.org/wiki/Mailman#Disable_or_re-enable_a_mailing_list [22:16:51] disable [22:17:03] (03CR) 10Krinkle: "I've e-mailed a sample capture to start with :)" [puppet] - 10https://gerrit.wikimedia.org/r/375345 (https://phabricator.wikimedia.org/T104902) (owner: 10Phedenskog) [22:17:06] herron will know better though [22:17:45] (03CR) 10Krinkle: "Also, I asked Ops with help to have these tests run automatically in Jenkins. This is now working. Feel free to rebase this and the to-be-" [puppet] - 10https://gerrit.wikimedia.org/r/375345 (https://phabricator.wikimedia.org/T104902) (owner: 10Phedenskog) [22:17:55] tabbycat: got it, that's a good list of what to do, then I could just remove myself as owner, maybe :) [22:18:38] greg-g: ./disable_list [22:18:46] on... rutherfordium? [22:18:53] can't remember the mailman db [22:19:31] I don't have access there :) [22:19:35] fermium [22:19:57] oh? :O no way! I though greg-g had access to everything [22:20:10] :) gladly not [22:20:39] greg-g: just file a task, leaving a list without owners is not a good idea; and disabling it should fix the problem :) [22:21:19] leaving the list without owners is what is suggested there :) [22:21:36] you can always add a new owner if needed via the cli [22:21:57] the removal of the owner is so they don't get notifications of spam messages that are held in moderation (as it sets to emergerncy moderation of all messages) [22:22:27] true [22:22:40] I stopped reading after ./disable_list :P [22:24:13] greg-g: we can disable a list and leave the archives in place... [22:24:20] we have quite a few [22:24:27] as long as its public archives, its clean enough [22:24:42] if they are private archives, its messy since no one can reset their own password to access the archives once the list is disabled. [22:26:19] robh: just filed the task, i couldn't find some of the options the script sets: https://phabricator.wikimedia.org/T176679 [22:26:59] greg-g: so disabled but leave the archives intact correct? [22:27:10] yup [22:28:36] done [22:28:44] ./disable_list mediawiki-core on fermium should do [22:29:04] yeah ive done many times before [22:29:07] =] [22:29:16] thanks man [22:29:17] our wiitech documentation for our mailman instance is nice! [22:29:25] (i had zero to do with said documentation) [22:30:03] now greg-g should remove himself from the admin list and we're all set [22:30:09] right? [22:30:47] :) using ./disable_list instead of doing manually = good :) [22:30:48] i thought he had done so, will do now [22:30:58] +1 [22:31:29] yeah would be nice if disable lists stripped it out but i get why it doesnt (it could be a temp disable or whatever) [22:31:40] if I could have found all the "emergency_option_2" etc I would have done it :P [22:31:52] also if it doesnt null out the moderator and admin fields its not really done [22:33:41] sounds like it may need some kind of --hard [22:33:48] parameter [22:34:24] !log gerrit2001 - schedule downtime for reinstall with strech, reboot imminent [22:34:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:35:09] for the invaluable help provided, I appoint myself and award myself the chief mouser phabricator badge :P [22:38:29] aha [22:38:32] woosps [22:39:07] (03PS1) 10Dzahn: install: let gerrit2001 use stretch installer [puppet] - 10https://gerrit.wikimedia.org/r/380656 (https://phabricator.wikimedia.org/T168562) [22:39:26] (03CR) 10Paladox: [C: 031] install: let gerrit2001 use stretch installer [puppet] - 10https://gerrit.wikimedia.org/r/380656 (https://phabricator.wikimedia.org/T168562) (owner: 10Dzahn) [22:39:53] good night [22:40:36] (03PS2) 10Dzahn: install: let gerrit2001 use stretch installer [puppet] - 10https://gerrit.wikimedia.org/r/380656 (https://phabricator.wikimedia.org/T168562) [22:41:17] (03CR) 10Dzahn: [C: 032] install: let gerrit2001 use stretch installer [puppet] - 10https://gerrit.wikimedia.org/r/380656 (https://phabricator.wikimedia.org/T168562) (owner: 10Dzahn) [22:53:34] !log gerrit2001 - PXE booting [22:53:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:00:04] addshore, hashar, anomie, RainbowSprinkles, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: Time to snap out of that daydream and deploy Evening SWAT (Max 8 patches). Get on with it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170925T2300). [23:00:04] No GERRIT patches in the queue for this window AFAICS. [23:02:16] 4pm?! [23:04:15] ? is it unsual? daylight savings? [23:04:18] Debian GNU/Linux 9 gerrit2001 ttyS1 [23:04:21] [23:04:24] gerrit2001 login: [23:04:26] paladox: ^ [23:04:40] :) [23:05:40] mutante: more "it's already 4pm?!" [23:06:11] oh, i thought you were saying the bot got the wrong deployment time :) [23:06:23] :) [23:07:57] 10Operations, 10Ops-Access-Requests, 10Research: Server access for Miriam Redi - https://phabricator.wikimedia.org/T176682#3633731 (10DarTar) [23:15:39] !log gerrit2001 reinstalled with stretch, revoked old puppet cert, accepted new puppet cert, initial run that will do base and all the gerrit things at once.. (T168562) [23:15:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:15:44] T168562: Reimage gerrit2001 as stretch - https://phabricator.wikimedia.org/T168562 [23:21:31] omg, the puppet run just finished.. without an error or anything [23:21:40] did not expect [23:21:48] kind of [23:22:55] RECOVERY - puppet last run on gerrit2001 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [23:23:35] mutante ^^ [23:23:36] heh [23:24:00] yea, lol, hah [23:27:17] but that's just Apache.. nice enough but not gerrit yet [23:32:39] i like the part that Letsencrypt simply worked, yes :) [23:32:53] with the right hostname [23:33:04] :) [23:42:45] 10Operations, 10Gerrit, 10Patch-For-Review, 10Release-Engineering-Team (Backlog): Reimage gerrit2001 as stretch - https://phabricator.wikimedia.org/T168562#3633791 (10Dzahn) gerrit2001 is back up with stretch, puppet did all the things, Apache is up and running, Letsencrypt worked and automatically got cer...