[00:00:04] addshore, hashar, anomie, ostriches, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, and thcipriani: Dear anthropoid, the time has come. Please deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170228T0000). [00:00:04] bmansurov and Amir1: A patch you scheduled for Evening SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [00:00:15] here [00:00:19] RECOVERY - puppet last run on mw1259 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [00:00:32] o/ [00:01:05] I can do it [00:01:34] (03PS2) 10Chad: Enable editmyoptions right for all users on loginwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/340258 (https://phabricator.wikimedia.org/T158871) (owner: 10Kaldari) [00:02:07] here [00:03:21] RainbowSprinkles: how did you go from being a demon to being rainbow sprinkles? This seems like a big change. [00:03:41] kaldari: Context: https://anyonecanedit.org/cocoa.jpg [00:03:43] kaldari, I blame ostriches [00:04:28] :) [00:06:13] (03CR) 10Chad: [C: 032] Enable editmyoptions right for all users on loginwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/340258 (https://phabricator.wikimedia.org/T158871) (owner: 10Kaldari) [00:06:59] PROBLEM - puppet last run on rhodium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [00:07:43] (03Merged) 10jenkins-bot: Enable editmyoptions right for all users on loginwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/340258 (https://phabricator.wikimedia.org/T158871) (owner: 10Kaldari) [00:07:51] (03CR) 10jenkins-bot: Enable editmyoptions right for all users on loginwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/340258 (https://phabricator.wikimedia.org/T158871) (owner: 10Kaldari) [00:09:56] !log demon@tin Synchronized wmf-config/CommonSettings.php: Enable editmyoptions right for all users on loginwiki (duration: 00m 41s) [00:10:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:10:07] kaldari: Live everywhere ^ [00:10:18] checking [00:12:38] RainbowSprinkles: looks good. Thanks! [00:13:25] yw [00:13:59] !log demon@tin Synchronized php-1.29.0-wmf.13/extensions/Nuke/Nuke_body.php: Move back to old caller names (duration: 00m 43s) [00:14:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:14:05] Amir1: You're live everywhere ^^ [00:14:20] RainbowSprinkles: Thanks! [00:14:50] yw [00:15:24] !log demon@tin Synchronized php-1.29.0-wmf.13/extensions/MobileFrontend/resources/skins.minerva.base.styles/ui.less: Fix the incorrect magnify glass icon position in lang search (duration: 00m 39s) [00:15:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:15:37] bmansurov: You're live everywhere ^^ [00:15:46] RainbowSprinkles: checking [00:16:05] RainbowSprinkles: works just fine [00:16:08] (03PS2) 10Dzahn: install/prometheus: add prometheus::ops to bast3002 [puppet] - 10https://gerrit.wikimedia.org/r/340166 (https://phabricator.wikimedia.org/T156506) [00:17:51] RainbowSprinkles: works as expected. thank you! [00:18:13] Awesome [00:18:27] (03CR) 10jerkins-bot: [V: 04-1] install/prometheus: add prometheus::ops to bast3002 [puppet] - 10https://gerrit.wikimedia.org/r/340166 (https://phabricator.wikimedia.org/T156506) (owner: 10Dzahn) [00:18:36] And that ends another exciting episode of swat. Tune in tomorrow, same time, for more cash and prizes [00:21:19] PROBLEM - puppet last run on stat1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [00:21:43] (03PS3) 10Dzahn: install/prometheus: add prometheus::ops to bast3002 [puppet] - 10https://gerrit.wikimedia.org/r/340166 (https://phabricator.wikimedia.org/T156506) [00:22:09] PROBLEM - ElasticSearch health check for shards on relforge1001 is CRITICAL: CRITICAL - elasticsearch http://10.64.4.13:9200/_cluster/health error while fetching: HTTPConnectionPool(host=10.64.4.13, port=9200): Read timed out. (read timeout=4) [00:22:10] PROBLEM - ElasticSearch health check for shards on relforge1002 is CRITICAL: CRITICAL - elasticsearch http://10.64.37.21:9200/_cluster/health error while fetching: HTTPConnectionPool(host=10.64.37.21, port=9200): Read timed out. (read timeout=4) [00:25:39] i might have overloaded relforge doing some testing with elasticsearch 5 ... safe to ignore [00:26:09] RECOVERY - ElasticSearch health check for shards on relforge1002 is OK: OK - elasticsearch status relforge-eqiad: status: green, number_of_nodes: 2, unassigned_shards: 0, number_of_pending_tasks: 0, number_of_in_flight_fetch: 0, timed_out: False, active_primary_shards: 46, task_max_waiting_in_queue_millis: 0, cluster_name: relforge-eqiad, relocating_shards: 0, active_shards_percent_as_number: 100.0, active_shards: 47, initializ [00:27:09] PROBLEM - puppet last run on fermium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [00:28:02] 06Operations, 06Discovery, 10Wikimedia-Portals, 03Discovery-Portal-Sprint, and 2 others: https://www.wikipedia.org/ portal doesn't have any text - https://phabricator.wikimedia.org/T158782#3059905 (10Dzahn) >>! In T158782#3057319, @Gehel wrote: > As I understand the situation (helped by enabling rewrite lo... [00:29:09] PROBLEM - ElasticSearch health check for shards on relforge1002 is CRITICAL: CRITICAL - elasticsearch http://10.64.37.21:9200/_cluster/health error while fetching: HTTPConnectionPool(host=10.64.37.21, port=9200): Read timed out. (read timeout=4) [00:29:43] (03CR) 10Dzahn: [C: 032] install/prometheus: add prometheus::ops to bast3002 [puppet] - 10https://gerrit.wikimedia.org/r/340166 (https://phabricator.wikimedia.org/T156506) (owner: 10Dzahn) [00:29:47] !log restart elasticsearch on relforge1001, putting too much load on the machine got it stuck in a GC spiral with 1minute+ collections [00:29:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:33:56] !log restart elasticsearch on relforge1002, putting too much load on the machine got it stuck in a GC spiral with 1minute+ collections [00:34:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:34:59] RECOVERY - puppet last run on rhodium is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [00:39:47] (03PS4) 10Dzahn: prometheus: temporary rsync server for metrics migration [puppet] - 10https://gerrit.wikimedia.org/r/330348 (https://phabricator.wikimedia.org/T148408) (owner: 10Filippo Giunchedi) [00:39:59] RECOVERY - ElasticSearch health check for shards on relforge1001 is OK: OK - elasticsearch status relforge-eqiad: status: red, number_of_nodes: 2, unassigned_shards: 0, number_of_pending_tasks: 0, number_of_in_flight_fetch: 0, timed_out: False, active_primary_shards: 45, task_max_waiting_in_queue_millis: 0, cluster_name: relforge-eqiad, relocating_shards: 0, active_shards_percent_as_number: 97.8723404255, active_shards: 46, ini [00:40:09] RECOVERY - ElasticSearch health check for shards on relforge1002 is OK: OK - elasticsearch status relforge-eqiad: status: red, number_of_nodes: 2, unassigned_shards: 0, number_of_pending_tasks: 0, number_of_in_flight_fetch: 0, timed_out: False, active_primary_shards: 45, task_max_waiting_in_queue_millis: 0, cluster_name: relforge-eqiad, relocating_shards: 0, active_shards_percent_as_number: 97.8723404255, active_shards: 46, ini [00:46:15] 06Operations, 10Annual-Report, 10Security-Reviews: add subdomain for annual report 2016 - https://phabricator.wikimedia.org/T151798#3059934 (10Dzahn) 05Resolved>03Open re-opening. dev sent a follow-up fix for social network sharing https://gerrit.wikimedia.org/r/#/c/340263/ [00:47:30] (03CR) 10Dzahn: "Jeeeenkins.. would you please" [puppet] - 10https://gerrit.wikimedia.org/r/330348 (https://phabricator.wikimedia.org/T148408) (owner: 10Filippo Giunchedi) [00:48:09] (03CR) 10Dzahn: [V: 032 C: 032] prometheus: temporary rsync server for metrics migration [puppet] - 10https://gerrit.wikimedia.org/r/330348 (https://phabricator.wikimedia.org/T148408) (owner: 10Filippo Giunchedi) [00:51:19] RECOVERY - puppet last run on stat1002 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [00:52:30] (03CR) 10Thcipriani: [C: 031] "I have qualms about this one. They will mostly be alleviated when branch validation patch (in progress) lands." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336901 (owner: 10Chad) [00:55:09] RECOVERY - puppet last run on fermium is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [00:55:39] PROBLEM - Check systemd state on prometheus2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [00:56:57] (03CR) 10Chad: [C: 032] Scap clean: Automate purging of old deployment branches from gerrit [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336901 (owner: 10Chad) [00:57:49] PROBLEM - Check systemd state on prometheus1004 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [00:58:45] (03CR) 10Dzahn: "Could not start Service[ferm]: because " DNS query for 'bast3001.wikimedia.orgbast3002.wikimedia.org' failed: NXDOMAIN" oops" [puppet] - 10https://gerrit.wikimedia.org/r/330348 (https://phabricator.wikimedia.org/T148408) (owner: 10Filippo Giunchedi) [00:58:47] (03Merged) 10jenkins-bot: Scap clean: Automate purging of old deployment branches from gerrit [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336901 (owner: 10Chad) [00:58:56] (03CR) 10jenkins-bot: Scap clean: Automate purging of old deployment branches from gerrit [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336901 (owner: 10Chad) [00:59:06] well, this i can accept as a reason " DNS query for 'bast3001.wikimedia.orgbast3002.wikimedia.org' failed: NXDOMAIN"" :p [00:59:09] PROBLEM - puppet last run on snapshot1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [00:59:29] PROBLEM - Check systemd state on bast3002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [01:01:30] !log demon@tin Synchronized scap/plugins/clean.py: No-op, more cleanups for clean.py (duration: 00m 42s) [01:01:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:06:49] PROBLEM - Check systemd state on prometheus1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [01:07:09] PROBLEM - Check systemd state on prometheus1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [01:07:50] ACKNOWLEDGEMENT - Check systemd state on bast3002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. daniel_zahn ferm - fix is coming [01:07:50] ACKNOWLEDGEMENT - Check systemd state on prometheus1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. daniel_zahn ferm - fix is coming [01:07:50] ACKNOWLEDGEMENT - Check systemd state on prometheus1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. daniel_zahn ferm - fix is coming [01:07:50] ACKNOWLEDGEMENT - Check systemd state on prometheus1004 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. daniel_zahn ferm - fix is coming [01:07:50] ACKNOWLEDGEMENT - Check systemd state on prometheus2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. daniel_zahn ferm - fix is coming [01:11:39] PROBLEM - Check for valid instance states on labnodepool1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [01:12:19] PROBLEM - Check systemd state on prometheus1003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [01:13:02] (03PS1) 10Dzahn: prometheus: fix ferm rules for rsync and add IPv6 [puppet] - 10https://gerrit.wikimedia.org/r/340267 [01:17:29] PROBLEM - Check systemd state on bast3001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [01:17:50] (03CR) 10Dzahn: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/340267 (owner: 10Dzahn) [01:19:24] (03CR) 10Dzahn: [V: 032 C: 032] "can't wait that long for jenkins when fixing a broken puppet run :/" [puppet] - 10https://gerrit.wikimedia.org/r/340267 (owner: 10Dzahn) [01:19:29] PROBLEM - Check systemd state on prometheus2002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [01:23:39] RECOVERY - Check systemd state on prometheus2001 is OK: OK - running: The system is fully operational [01:24:13] (03CR) 1020after4: [C: 031] Phabricator: Migrate to base::service_unit for ssh-phab [puppet] - 10https://gerrit.wikimedia.org/r/339763 (https://phabricator.wikimedia.org/T137928) (owner: 10Paladox) [01:24:29] RECOVERY - Check systemd state on bast3002 is OK: OK - running: The system is fully operational [01:25:49] RECOVERY - Check systemd state on prometheus1004 is OK: OK - running: The system is fully operational [01:28:09] RECOVERY - puppet last run on snapshot1001 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [01:30:52] (03CR) 10Dzahn: "this fixed the puppet run, ferm service starts, and:" [puppet] - 10https://gerrit.wikimedia.org/r/340267 (owner: 10Dzahn) [01:31:30] RECOVERY - Check systemd state on bast3001 is OK: OK - running: The system is fully operational [01:34:59] RECOVERY - Check systemd state on prometheus1001 is OK: OK - running: The system is fully operational [01:35:09] RECOVERY - Check systemd state on prometheus1002 is OK: OK - running: The system is fully operational [01:36:19] RECOVERY - Check systemd state on prometheus1003 is OK: OK - running: The system is fully operational [01:36:29] RECOVERY - Check systemd state on prometheus2002 is OK: OK - running: The system is fully operational [01:38:29] 06Operations, 10ops-codfw, 10hardware-requests, 13Patch-For-Review, 15User-Elukey: Reclaim/Decommission old codfw mc2001->mc2016 hosts - https://phabricator.wikimedia.org/T157675#3012720 (10Dzahn) please remove mc2001 thru mc2007 from Icinga - they are reported as CRITical "host down" alerts and were als... [01:39:43] ACKNOWLEDGEMENT - Host mc2001 is DOWN: PING CRITICAL - Packet loss = 100% daniel_zahn https://phabricator.wikimedia.org/T157675 [01:39:43] ACKNOWLEDGEMENT - Host mc2002 is DOWN: PING CRITICAL - Packet loss = 100% daniel_zahn https://phabricator.wikimedia.org/T157675 [01:39:43] ACKNOWLEDGEMENT - Host mc2003 is DOWN: PING CRITICAL - Packet loss = 100% daniel_zahn https://phabricator.wikimedia.org/T157675 [01:39:43] ACKNOWLEDGEMENT - Host mc2004 is DOWN: PING CRITICAL - Packet loss = 100% daniel_zahn https://phabricator.wikimedia.org/T157675 [01:39:43] ACKNOWLEDGEMENT - Host mc2005 is DOWN: PING CRITICAL - Packet loss = 100% daniel_zahn https://phabricator.wikimedia.org/T157675 [01:39:43] ACKNOWLEDGEMENT - Host mc2006 is DOWN: PING CRITICAL - Packet loss = 100% daniel_zahn https://phabricator.wikimedia.org/T157675 [01:39:43] ACKNOWLEDGEMENT - Host mc2007 is DOWN: PING CRITICAL - Packet loss = 100% daniel_zahn https://phabricator.wikimedia.org/T157675 [01:40:54] ^ we should notice this stuff more - 7 hosts over 8 hours but icinga-wm wasn't telling us all day? [01:40:59] PROBLEM - puppet last run on install1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [01:41:20] even though in web ui it was not acked or in downtime [01:42:41] !log mw1198 - restart hhvm [01:42:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:42:59] RECOVERY - puppet last run on install1002 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [01:43:08] 07Puppet, 10Continuous-Integration-Infrastructure: Need a better way of testing puppet patches for contint/integration stuff - https://phabricator.wikimedia.org/T126370#3060034 (10scfc) >>! In T126370#2046565, @hashar wrote: > @scfc great teaser. I would like to know more about environments. > > Is all that lo... [01:44:19] RECOVERY - HHVM rendering on mw1198 is OK: HTTP OK: HTTP/1.1 200 OK - 73352 bytes in 0.223 second response time [01:44:19] RECOVERY - Apache HTTP on mw1198 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.058 second response time [01:45:09] RECOVERY - Nginx local proxy to apache on mw1198 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 0.032 second response time [01:47:13] (03CR) 10Dzahn: "fixed with https://gerrit.wikimedia.org/r/#/c/340267/" [puppet] - 10https://gerrit.wikimedia.org/r/330348 (https://phabricator.wikimedia.org/T148408) (owner: 10Filippo Giunchedi) [01:48:12] (03CR) 10Dzahn: "please remove from Icinga first (puppet node clean/deactivate)" [dns] - 10https://gerrit.wikimedia.org/r/340195 (owner: 10Papaul) [02:00:55] 07Puppet, 13Patch-For-Review: Inconsistent groups for Git repositories with role::puppetmaster::standalone - https://phabricator.wikimedia.org/T152060#3060062 (10scfc) 05Open>03Resolved [02:18:40] !log rsyncing prometheus metrics data from bast3001 to bast3002 (T156506) [02:18:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:18:46] T156506: Replace bast3001 - https://phabricator.wikimedia.org/T156506 [02:25:19] PROBLEM - puppet last run on wtp1020 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [02:30:36] !log l10nupdate@tin scap sync-l10n completed (1.29.0-wmf.13) (duration: 11m 40s) [02:30:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:35:39] PROBLEM - Check for valid instance states on labnodepool1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:35:56] !log l10nupdate@tin ResourceLoader cache refresh completed at Tue Feb 28 02:35:56 UTC 2017 (duration 5m 20s) [02:36:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:41:32] (03PS2) 10Legoktm: Enable Linter on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/335052 [02:48:17] (03PS1) 10Dzahn: switch prometheus.eqiad to bast3002 [dns] - 10https://gerrit.wikimedia.org/r/340272 [02:49:52] (03PS2) 10Dzahn: switch prometheus.eqiad to bast3002 [dns] - 10https://gerrit.wikimedia.org/r/340272 (https://phabricator.wikimedia.org/T156506) [02:53:19] RECOVERY - puppet last run on wtp1020 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [02:56:25] 06Operations, 06Labs: Backport python-ldap3 package from Utopic to Precise / Trusty - https://phabricator.wikimedia.org/T101824#3060155 (10scfc) 05Open>03declined >>! In T101824#1737022, @Dzahn wrote: > What's still missing? Actually this issue is mostly obsolete. While `python-ldap3` was backported to T... [03:23:09] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 695.07 seconds [03:26:09] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 296.84 seconds [03:26:39] PROBLEM - puppet last run on terbium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [03:55:39] RECOVERY - puppet last run on terbium is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [04:05:29] PROBLEM - puppet last run on contint2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [04:33:29] RECOVERY - puppet last run on contint2001 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [04:55:09] PROBLEM - puppet last run on relforge1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [05:00:29] PROBLEM - puppet last run on bast4001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [05:16:39] PROBLEM - puppet last run on ms-fe1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [05:22:09] RECOVERY - puppet last run on relforge1001 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [05:29:29] RECOVERY - puppet last run on bast4001 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [05:44:39] RECOVERY - puppet last run on ms-fe1001 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [06:09:59] PROBLEM - puppet last run on mw1273 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:20:31] (03CR) 1020after4: "I've exported this and 339002 over to https://phabricator.wikimedia.org/source/keyholder" [puppet] - 10https://gerrit.wikimedia.org/r/338984 (https://phabricator.wikimedia.org/T158660) (owner: 10Volans) [06:32:10] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "Minor nit, but overall LGTM" (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/339763 (https://phabricator.wikimedia.org/T137928) (owner: 10Paladox) [06:38:09] RECOVERY - puppet last run on mw1273 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [06:44:29] PROBLEM - puppet last run on analytics1033 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:00:32] !log Deploy alter table enwiki.revision db2034 - T132416 [07:00:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:00:40] T132416: Rampant differences in indexes on enwiki.revision across the DB cluster - https://phabricator.wikimedia.org/T132416 [07:02:59] PROBLEM - mailman I/O stats on fermium is CRITICAL: CRITICAL - I/O stats: Transfers/Sec=728.00 Read Requests/Sec=333.20 Write Requests/Sec=689.60 KBytes Read/Sec=41921.60 KBytes_Written/Sec=5950.00 [07:12:10] !log run pt-table-checksum on bgwiki (s2) - T154485 [07:12:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:12:15] T154485: run pt-table-checksum before decommissioning db1015, db1035,db1044,db1038 - https://phabricator.wikimedia.org/T154485 [07:13:29] RECOVERY - puppet last run on analytics1033 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [07:16:59] RECOVERY - mailman I/O stats on fermium is OK: OK - I/O stats: Transfers/Sec=190.40 Read Requests/Sec=164.90 Write Requests/Sec=41.30 KBytes Read/Sec=1834.00 KBytes_Written/Sec=6108.80 [07:33:04] 06Operations, 10ops-eqiad, 10DBA: Degraded RAID on db1053 - https://phabricator.wikimedia.org/T151465#3060322 (10Marostegui) a:03Cmjohnson Hey Chris! Would you have time to replace this disk today? Thanks! [07:34:46] 06Operations, 10ops-eqiad: Degraded RAID on db1060 - https://phabricator.wikimedia.org/T158193#3060326 (10Marostegui) We should replace this two disks, as they had media errors. I would suggest we do one at the time. - Replace one - Wait for the RAID to rebuild - Replace the second one. [07:35:08] 06Operations, 10ops-eqiad: Degraded RAID on db1060 - https://phabricator.wikimedia.org/T158193#3060327 (10Marostegui) 05stalled>03Open [07:35:28] 06Operations, 10ops-eqiad, 10DBA: Degraded RAID on db1060 - https://phabricator.wikimedia.org/T158193#3029364 (10Marostegui) [07:37:59] (03PS1) 10Jcrespo: Revert "mariadb: Depool db1051 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/340284 [07:38:25] (03PS2) 10Jcrespo: Revert "mariadb: Depool db1051 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/340284 [07:41:09] !log Deploy alter table s4.user_groups - T155605 [07:41:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:41:15] T155605: Schema changes for expiring user groups - https://phabricator.wikimedia.org/T155605 [08:05:27] apergos: Hey, sorry to bother but you haven't made an announcement in Monday about this: https://gerrit.wikimedia.org/r/#/dashboard/self [08:05:32] oops [08:05:43] https://gerrit.wikimedia.org/r/#/c/339332/ [08:12:03] 06Operations, 10ops-codfw: troubleshoot drac on ms-be2010.codfw.wmnet - https://phabricator.wikimedia.org/T155690#3060353 (10fgiunchedi) We can decom the old ms-be machines as soon as the new ms-be hardware is fully in service. Specifically for ms-be2010 I wouldn't spend too much time after fixing its idrac, i... [08:15:01] !log Deploy alter table s5 dewiki.user_groups - T155605 [08:15:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:15:07] T155605: Schema changes for expiring user groups - https://phabricator.wikimedia.org/T155605 [08:18:29] (03CR) 10Jcrespo: [C: 032] Revert "mariadb: Depool db1051 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/340284 (owner: 10Jcrespo) [08:18:58] !log Deploy alter table s5 wikidatawiki.user_groups - T155605 [08:19:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:19:10] 06Operations, 10ops-codfw, 10hardware-requests, 13Patch-For-Review, 15User-Elukey: Reclaim/Decommission old codfw mc2001->mc2016 hosts - https://phabricator.wikimedia.org/T157675#3060358 (10elukey) >>! In T157675#3060032, @Dzahn wrote: > please remove mc2001 thru mc2007 from Icinga - they are reported as... [08:19:33] (03Merged) 10jenkins-bot: Revert "mariadb: Depool db1051 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/340284 (owner: 10Jcrespo) [08:19:43] (03CR) 10jenkins-bot: Revert "mariadb: Depool db1051 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/340284 (owner: 10Jcrespo) [08:22:23] !log jynus@tin Synchronized wmf-config/db-eqiad.php: Repool db1051 after maintenance (duration: 00m 41s) [08:22:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:23:39] PROBLEM - puppet last run on krypton is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [08:24:52] !log run pt-table-checksum on bgwiktionary (s2) - T154485 [08:24:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:24:57] T154485: run pt-table-checksum before decommissioning db1015, db1035,db1044,db1038 - https://phabricator.wikimedia.org/T154485 [08:28:35] (03PS1) 10Jcrespo: Revert "mariadb: Depool db1045 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/340285 [08:30:59] PROBLEM - mailman I/O stats on fermium is CRITICAL: CRITICAL - I/O stats: Transfers/Sec=2259.90 Read Requests/Sec=2888.20 Write Requests/Sec=0.50 KBytes Read/Sec=22576.40 KBytes_Written/Sec=11.60 [08:35:04] !log Deploy alter table s6 (frwiki,jawiki,ruwiki).user_groups - T155605 [08:35:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:35:09] T155605: Schema changes for expiring user groups - https://phabricator.wikimedia.org/T155605 [08:35:39] RECOVERY - Check for valid instance states on labnodepool1001 is OK: nodepool state management is OK [08:35:59] (03CR) 10Jcrespo: [C: 032] Revert "mariadb: Depool db1045 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/340285 (owner: 10Jcrespo) [08:36:44] 06Operations: PuppetDB is auto-deactivating hosts - https://phabricator.wikimedia.org/T159163#3058497 (10fgiunchedi) I'm +1 on not deactivating hosts automatically in puppetdb. Disabling auto-deactivation would also let us automatically catch hosts that have been decommissioned in DNS but haven't been node-clean... [08:37:24] (03Merged) 10jenkins-bot: Revert "mariadb: Depool db1045 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/340285 (owner: 10Jcrespo) [08:37:24] !log nodepool deleted alien instances 541585 541586 and 541587 [08:37:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:37:33] (03CR) 10jenkins-bot: Revert "mariadb: Depool db1045 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/340285 (owner: 10Jcrespo) [08:38:21] marostegui, thanks for doing all this! [08:38:22] I was wondering, what is the significance of the fact that some user_groups tables have int(5) rather than int(10) for ug_user? [08:38:26] !log jynus@tin Synchronized wmf-config/db-eqiad.php: Repool db1045 after maintenance (duration: 00m 40s) [08:38:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:39:42] tto: none, the "(X) in an int column is just visualization, normally you should aways add an int column without any display value [08:39:46] 06Operations, 10Revision-Scoring-As-A-Service-Backlog: Set up oresrdb redis node in codfw - https://phabricator.wikimedia.org/T139372#3060396 (10fgiunchedi) >>! In T139372#3051587, @Ladsgroup wrote: > Is there any plans on syncing codfw/eqiad redis nodes? It would be needed if the codfw starts to get traffic.... [08:39:57] I see, thank you :) [08:40:08] you are welcome! :) [08:41:21] tto, the agreement on WMF/mediawiki hackers [08:41:25] as I was told [08:41:53] is never define it on migrations, because it is a waste of discussion topics that, as manuel said, has no significance [08:42:18] Makes sense. If it doesn't do anything, may as well leave it alone [08:42:51] I think it was tim that told me discussions ongoing if certain filed should be 8 or 9 characters [08:43:00] *field [08:43:36] !log installing python-crypto security updates [08:43:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:43:59] RECOVERY - mailman I/O stats on fermium is OK: OK - I/O stats: Transfers/Sec=194.60 Read Requests/Sec=148.00 Write Requests/Sec=0.60 KBytes Read/Sec=1416.40 KBytes_Written/Sec=16.80 [08:46:43] (03Draft1) 10Paladox: Gerrit: Make gerritbot report the repo in the comment [puppet] - 10https://gerrit.wikimedia.org/r/340286 [08:46:47] (03PS2) 10Paladox: Gerrit: Make gerritbot report the repo in the comment [puppet] - 10https://gerrit.wikimedia.org/r/340286 (https://phabricator.wikimedia.org/T159202) [08:50:09] PROBLEM - puppet last run on es1016 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [08:51:39] RECOVERY - puppet last run on krypton is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [08:51:47] 06Operations, 06Discovery, 10Wikimedia-Portals, 03Discovery-Portal-Sprint, and 2 others: https://www.wikipedia.org/ portal doesn't have any text - https://phabricator.wikimedia.org/T158782#3060402 (10Gehel) >>! In T158782#3058926, @Krinkle wrote: >>>! In T158782#3057319, @Gehel wrote: >> >> `https://www.w... [08:56:15] 06Operations, 06Discovery, 10Wikimedia-Portals, 03Discovery-Portal-Sprint, and 2 others: https://www.wikipedia.org/ portal doesn't have any text - https://phabricator.wikimedia.org/T158782#3060407 (10Gehel) >>! In T158782#3059905, @Dzahn wrote: >>>! In T158782#3057319, @Gehel wrote: >> As I understand the... [08:56:18] (03PS1) 10Jcrespo: mariadb: Depool db1055 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/340287 (https://phabricator.wikimedia.org/T147747) [08:59:33] !log run pt-table-checksum on cswiki (s2) - T154485 [08:59:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:59:38] T154485: run pt-table-checksum before decommissioning db1015, db1035,db1044,db1038 - https://phabricator.wikimedia.org/T154485 [09:03:06] _joe_: Hi! It looks like tools-puppetmaster isn't running puppet for some reason, and puppet is failed across all of tools - know how i can restart puppet there? it is at tools-puppetmaster-02.tools.eqiad.wmflabs btw [09:03:39] !log Deploy alter table s1 (enwiki).user_groups - T155605 [09:03:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:03:45] T155605: Schema changes for expiring user groups - https://phabricator.wikimedia.org/T155605 [09:03:47] <_joe_> madhuvishy: I'll take a look in a minute [09:04:06] thanks! [09:05:18] (03CR) 10Gehel: portals: do not rewrite 404 errors (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/339657 (https://phabricator.wikimedia.org/T158782) (owner: 10Gehel) [09:05:26] <_joe_> madhuvishy: there is some misconfig in apache, see systemctl status apache2.service [09:05:36] <_joe_> I'll try to figure out what's wrong and why [09:06:11] _joe_: aah yes i see it [09:06:56] !log running alter table on db2042 T147747 [09:07:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:07:07] okay i'm half asleep let me know if i can help! thank you so much [09:07:41] 06Operations, 10Revision-Scoring-As-A-Service-Backlog, 10hardware-requests: Create one oresrdb VM in codfw - https://phabricator.wikimedia.org/T159207#3060424 (10fgiunchedi) [09:09:03] PROBLEM - LVS HTTPS IPv4 on ms-fe.svc.eqiad.wmnet is CRITICAL: connect to address 10.2.2.27 and port 443: Connection refused [09:09:28] <_joe_> godog: ^^ [09:09:31] <_joe_> that you? [09:09:35] it is just the expiration onf the known issue? [09:09:49] <_joe_> I know it's not in prod [09:10:01] yes, 4 day alarm [09:10:05] it is an expiration [09:10:26] yeah it has expired, sorry about that [09:10:35] np [09:10:40] <_joe_> madhuvishy: no problem, I can take care of it [09:10:40] I'll silence/ack [09:13:09] PROBLEM - puppet last run on mc1007 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:13:30] _joe_: <3 /me sleeps [09:15:22] 06Operations: Switch to predictable network interface names? - https://phabricator.wikimedia.org/T158429#3060446 (10MoritzMuehlenhoff) The old Debian-specific sticky rules are in fact only supported for the stretch release cycle, the NEWS file for udev explicity calls that existing systems need to be migrated:... [09:18:09] RECOVERY - puppet last run on es1016 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [09:20:59] PROBLEM - mailman I/O stats on fermium is CRITICAL: CRITICAL - I/O stats: Transfers/Sec=5010.10 Read Requests/Sec=3582.80 Write Requests/Sec=8.70 KBytes Read/Sec=22040.80 KBytes_Written/Sec=192.00 [09:23:49] PROBLEM - puppet last run on bast3001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [09:26:03] (03CR) 10Muehlenhoff: jenkins: migrate to systemd (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/337404 (owner: 10Hashar) [09:26:49] RECOVERY - puppet last run on bast3001 is OK: OK: Puppet is currently enabled, last run 8 minutes ago with 0 failures [09:28:03] <_joe_> madhuvishy: solved, the issue was that somehow the puppetmaster had an upgraded apache2 package coming from debian-security and not from us, which caused ssl to be linked against the wrong version [09:28:17] <_joe_> madhuvishy: the puppetmaster works again [09:30:09] PROBLEM - puppet last run on praseodymium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:33:12] (03CR) 10Jcrespo: [C: 032] mariadb: Depool db1055 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/340287 (https://phabricator.wikimedia.org/T147747) (owner: 10Jcrespo) [09:34:35] (03Merged) 10jenkins-bot: mariadb: Depool db1055 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/340287 (https://phabricator.wikimedia.org/T147747) (owner: 10Jcrespo) [09:34:47] (03CR) 10jenkins-bot: mariadb: Depool db1055 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/340287 (https://phabricator.wikimedia.org/T147747) (owner: 10Jcrespo) [09:34:59] RECOVERY - mailman I/O stats on fermium is OK: OK - I/O stats: Transfers/Sec=183.40 Read Requests/Sec=146.10 Write Requests/Sec=35.40 KBytes Read/Sec=1774.40 KBytes_Written/Sec=322.00 [09:38:54] !log Deploy alter table s7 on all wikis for table user_groups - T155605 [09:38:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:38:59] T155605: Schema changes for expiring user groups - https://phabricator.wikimedia.org/T155605 [09:42:09] RECOVERY - puppet last run on mc1007 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [09:43:27] !log jynus@tin Synchronized wmf-config/db-eqiad.php: Depool db1055 for maintenance (duration: 00m 40s) [09:43:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:47:18] !log running alter table on db1055 T147747 [09:47:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:49:38] (03CR) 10Hashar: jenkins: migrate to systemd (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/337404 (owner: 10Hashar) [09:49:56] moritzm: thanks for the jenkins/systemd review :-}}} [09:50:53] moritzm: I replied but in short systemd does not do variable interpolation when loading a file with EnvironmentFile . Eg a default having: JENKINS_ARGS="--httpPort $HTTP_PORT" would end up spawning a process with the literal "HTTP_PORT" [09:50:56] by design :( [09:54:56] yeah, I had to do a preexec script [09:55:07] to fix that for haproxy [09:56:17] but why not simply use the actual value in the ERB template? why does it need to pass $HTTP_PORT to begin with? [09:56:53] for me it was because it dependend of other things at run time, not at puppet time [09:57:22] 06Operations, 07Puppet: PuppetDB is auto-deactivating hosts - https://phabricator.wikimedia.org/T159163#3060544 (10Peachey88) [09:58:19] RECOVERY - puppet last run on praseodymium is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [10:00:12] !log restart zookeeper on conf1001 [10:00:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:00:19] PROBLEM - puppet last run on mw1259 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:09:44] !log Deploy alter table s2 on all wikis for table user_groups - T155605 [10:09:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:09:52] T155605: Schema changes for expiring user groups - https://phabricator.wikimedia.org/T155605 [10:17:25] (03PS15) 10Hashar: jenkins: migrate to systemd [puppet] - 10https://gerrit.wikimedia.org/r/337404 [10:18:13] jynus: yup I actually read your haproxy systemd file. IIrc you craft a .env file under /run for that purpose. I end up just passing the settings directly in ExecStart which looks good enough for me :} [10:19:24] I am quite happy about systemd overall [10:19:29] I start seeing the benefits of using it [10:19:56] (03PS1) 10Jcrespo: mariadb: Depool db1026 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/340294 (https://phabricator.wikimedia.org/T147747) [10:21:56] 06Operations, 06Discovery, 10Wikimedia-Portals, 03Discovery-Portal-Sprint, and 2 others: https://www.wikipedia.org/ portal doesn't have any text - https://phabricator.wikimedia.org/T158782#3060621 (10Gehel) The ErrorDocument directive being triggered is the one in `/etc/apache2/sites-available/01-main.conf... [10:23:06] (03PS3) 10Gehel: portals: do not rewrite 404 errors [puppet] - 10https://gerrit.wikimedia.org/r/339657 (https://phabricator.wikimedia.org/T158782) [10:23:28] !log run pt-table-checksum on enwikiquote (s2) - T154485 [10:23:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:23:34] T154485: run pt-table-checksum before decommissioning db1015, db1035,db1044,db1038 - https://phabricator.wikimedia.org/T154485 [10:25:28] (03CR) 10Jcrespo: [C: 032] mariadb: Depool db1026 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/340294 (https://phabricator.wikimedia.org/T147747) (owner: 10Jcrespo) [10:26:49] (03Merged) 10jenkins-bot: mariadb: Depool db1026 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/340294 (https://phabricator.wikimedia.org/T147747) (owner: 10Jcrespo) [10:26:57] (03CR) 10jenkins-bot: mariadb: Depool db1026 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/340294 (https://phabricator.wikimedia.org/T147747) (owner: 10Jcrespo) [10:28:01] !log jynus@tin Synchronized wmf-config/db-eqiad.php: Depool db1026 for maintenance (duration: 00m 39s) [10:28:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:30:20] RECOVERY - puppet last run on mw1259 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [10:30:39] PROBLEM - puppet last run on cp3003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:35:24] !log restar zookeeper on conf1003 [10:35:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:36:15] (03CR) 10Muehlenhoff: [C: 031] "Looks fine (Type is still set, but it doesn't hurt to be explicit either)" [puppet] - 10https://gerrit.wikimedia.org/r/337404 (owner: 10Hashar) [10:36:39] moritzm: ah sorry I forgot about Type=simple :-/ [10:37:09] PROBLEM - puppet last run on mw1293 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:41:49] PROBLEM - carbon-local-relay metric drops on graphite2001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [100.0] [10:42:49] RECOVERY - carbon-local-relay metric drops on graphite2001 is OK: OK: Less than 1.00% above the threshold [25.0] [10:47:12] (03PS2) 10Phuedx: Make Page Previews use RESTBase on Beta Cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/339475 (https://phabricator.wikimedia.org/T156800) [10:47:32] 06Operations, 10MediaWiki-extensions-InterwikiSorting, 10Wikidata, 10Wikimedia-Extension-setup, and 3 others: Deploy InterwikiSorting extension to production - https://phabricator.wikimedia.org/T150183#3060702 (10Lea_Lacroix_WMDE) Yeah, I don't think that we need to announce this globally. Maybe add it in... [10:53:32] !log run pt-table-checksum on enwiktionary (s2) - T154485 [10:53:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:53:38] T154485: run pt-table-checksum before decommissioning db1015, db1035,db1044,db1038 - https://phabricator.wikimedia.org/T154485 [10:56:41] !log restart zookeeper on conf1002 [10:56:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:58:39] RECOVERY - puppet last run on cp3003 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [11:01:09] PROBLEM - DPKG on thorium is CRITICAL: DPKG CRITICAL dpkg reports broken packages [11:02:09] RECOVERY - DPKG on thorium is OK: All packages OK [11:06:09] RECOVERY - puppet last run on mw1293 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [11:07:28] (03CR) 10Aklapper: [C: 031] "lgtm, thx" [puppet] - 10https://gerrit.wikimedia.org/r/317990 (owner: 10Alex Monk) [11:07:33] (03PS1) 10Jcrespo: mariadb: Depool db1053 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/340295 (https://phabricator.wikimedia.org/T147747) [11:08:00] (03PS8) 10Paladox: Phabricator: Migrate to base::service_unit for ssh-phab [puppet] - 10https://gerrit.wikimedia.org/r/339763 (https://phabricator.wikimedia.org/T137928) [11:08:11] (03PS9) 10Paladox: Phabricator: Migrate to base::service_unit for ssh-phab [puppet] - 10https://gerrit.wikimedia.org/r/339763 (https://phabricator.wikimedia.org/T137928) [11:08:45] (03PS2) 10Jcrespo: mariadb: Depool db1053 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/340295 (https://phabricator.wikimedia.org/T147747) [11:09:12] (03CR) 10Paladox: Phabricator: Migrate to base::service_unit for ssh-phab (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/339763 (https://phabricator.wikimedia.org/T137928) (owner: 10Paladox) [11:09:14] (03CR) 10jerkins-bot: [V: 04-1] Phabricator: Migrate to base::service_unit for ssh-phab [puppet] - 10https://gerrit.wikimedia.org/r/339763 (https://phabricator.wikimedia.org/T137928) (owner: 10Paladox) [11:09:46] (03PS16) 10Hashar: jenkins: migrate to systemd [puppet] - 10https://gerrit.wikimedia.org/r/337404 [11:09:53] (03PS10) 10Paladox: Phabricator: Migrate to base::service_unit for ssh-phab [puppet] - 10https://gerrit.wikimedia.org/r/339763 (https://phabricator.wikimedia.org/T137928) [11:13:56] (03CR) 10Jcrespo: [C: 032] mariadb: Depool db1053 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/340295 (https://phabricator.wikimedia.org/T147747) (owner: 10Jcrespo) [11:15:36] (03Merged) 10jenkins-bot: mariadb: Depool db1053 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/340295 (https://phabricator.wikimedia.org/T147747) (owner: 10Jcrespo) [11:16:38] (03CR) 10Hashar: jenkins: migrate to systemd (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/337404 (owner: 10Hashar) [11:16:48] (03CR) 10jenkins-bot: mariadb: Depool db1053 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/340295 (https://phabricator.wikimedia.org/T147747) (owner: 10Jcrespo) [11:18:00] !log jynus@tin Synchronized wmf-config/db-eqiad.php: Depool db1053 for maintenance (duration: 00m 40s) [11:18:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:20:09] PROBLEM - carbon-local-relay metric drops on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [100.0] [11:21:09] RECOVERY - carbon-local-relay metric drops on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [11:22:41] !log running alter table on db1053 T147747 [11:22:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:35:19] PROBLEM - MariaDB Slave SQL: s3 on dbstore1002 is CRITICAL: CRITICAL slave_sql_state Slave_SQL_Running: No, Errno: 1146, Errmsg: Error Table flaggedrevs_labswikimedia.user_groups doesnt exist on query. Default database: flaggedrevs_labswikimedia. [Query snipped] [11:37:13] also db1069:3313 broken, marostegui [11:37:24] same thing? [11:37:25] let me see [11:38:09] indeed [11:38:19] I will take care of those, there will be probably more of those :( [11:38:26] check also db1095, in case [11:38:57] yeah, it is broken [11:39:34] and dbstore1001 will break tomorrow, and I will check 2001 too [11:40:35] !log gehel@puppetmaster1001 conftool action : set/pooled=no; selector: name=wdqs1003.eqiad.wmnet [11:40:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:40:50] !log depooling wdqs1003 for investigation (high 5xx rate) [11:40:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:41:19] RECOVERY - MariaDB Slave SQL: s3 on dbstore1002 is OK: OK slave_sql_state Slave_SQL_Running: Yes [11:42:04] 06Operations, 10Domains, 10Education-Program-Dashboard, 10Traffic: Create short link for outreachdashboard.wmflabs.org - https://phabricator.wikimedia.org/T146332#3060800 (10ema) [11:42:09] PROBLEM - carbon-local-relay metric drops on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [100.0] [11:42:49] PROBLEM - carbon-local-relay metric drops on graphite2001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [100.0] [11:43:07] known ^ also expired downtime [11:43:09] RECOVERY - carbon-local-relay metric drops on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [11:43:29] PROBLEM - High lag on wdqs1003 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [1800.0] [11:43:49] RECOVERY - carbon-local-relay metric drops on graphite2001 is OK: OK: Less than 1.00% above the threshold [25.0] [11:47:23] !log restarting wdqs-blazegraph on wdqs1003 [11:47:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:51:29] PROBLEM - High lag on wdqs1003 is CRITICAL: CRITICAL: 86.96% of data above the critical threshold [1800.0] [11:57:09] PROBLEM - puppet last run on db1086 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [12:09:19] PROBLEM - puppet last run on analytics1049 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [12:10:45] (03PS1) 10Muehlenhoff: Update to 4.4.52 [debs/linux44] - 10https://gerrit.wikimedia.org/r/340300 [12:12:29] RECOVERY - High lag on wdqs1003 is OK: OK: Less than 30.00% above the threshold [600.0] [12:25:09] RECOVERY - puppet last run on db1086 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [12:29:04] (03PS1) 10Jcrespo: Revert "mariadb: Depool db1055 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/340303 [12:38:11] (03CR) 10Fdans: Changes to perf consumer of event logging events (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/337158 (https://phabricator.wikimedia.org/T156760) (owner: 10Nuria) [12:39:19] RECOVERY - puppet last run on analytics1049 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [12:39:35] 06Operations, 06Performance-Team, 10Thumbor: Make Thumbor IM engine based on a subprocess - https://phabricator.wikimedia.org/T149903#3060857 (10Gilles) [12:42:26] (03CR) 10Fdans: Changes to perf consumer of event logging events (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/337158 (https://phabricator.wikimedia.org/T156760) (owner: 10Nuria) [12:42:50] PROBLEM - carbon-local-relay metric drops on graphite2001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [100.0] [12:43:49] RECOVERY - carbon-local-relay metric drops on graphite2001 is OK: OK: Less than 1.00% above the threshold [25.0] [12:46:04] (03CR) 10Muehlenhoff: [C: 032] Update to 4.4.52 [debs/linux44] - 10https://gerrit.wikimedia.org/r/340300 (owner: 10Muehlenhoff) [12:48:59] <_joe_> !log flushed memcached in codfw, restarting hhvm on appserver to flush APC in order to test warmup script [12:49:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:53:39] PROBLEM - HHVM rendering on mw2176 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:53:49] PROBLEM - Nginx local proxy to apache on mw2102 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:53:49] PROBLEM - HHVM rendering on mw2097 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:53:49] (03PS1) 10Gilles: Upgrade to 0.1.35 [debs/python-thumbor-wikimedia] - 10https://gerrit.wikimedia.org/r/340306 (https://phabricator.wikimedia.org/T149903) [12:54:29] RECOVERY - HHVM rendering on mw2176 is OK: HTTP OK: HTTP/1.1 200 OK - 74108 bytes in 0.212 second response time [12:54:39] RECOVERY - Nginx local proxy to apache on mw2102 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 0.193 second response time [12:54:39] RECOVERY - HHVM rendering on mw2097 is OK: HTTP OK: HTTP/1.1 200 OK - 74110 bytes in 0.267 second response time [12:56:50] _joe_: might be related? ^^^ [12:57:47] <_joe_> volans: yes [12:58:07] but I guess not expected :D [13:02:20] PROBLEM - puppet last run on mw1287 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [13:03:31] !log ran namespaceDupes on meta to fix some Config pages [13:03:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:04:25] Reedy: you read my mind [13:04:48] I was wondering how bad meta was wrt namespaceDupes [13:05:10] Not bad [13:05:20] id=6721695 ns=0 dbk=User:Cyberbot_I/Run/Meta-cont *** dest title exists and --add-prefix not specified [13:05:21] id=6721711 ns=0 dbk=User:Cyberbot_I/Run/Meta-daily *** dest title exists and --add-prefix not specified [13:05:21] id=6722577 ns=0 dbk=Translations:Terms_of_use/Paid_contributions_amendment/81/bg *** dest title exists and --add-prefix not specified [13:05:28] Are the only ones that remain [13:06:01] Though [13:06:01] | 10111687 | 1 | Config:FlowReportcard [13:07:05] that flow %&!!#@@ still annoying people [13:15:21] (03PS2) 10Jcrespo: Revert "mariadb: Depool db1055 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/340303 [13:19:47] (03PS1) 10Hashar: Dummy secret nova/osstackcanary [labs/private] - 10https://gerrit.wikimedia.org/r/340309 [13:20:47] (03CR) 10Jcrespo: [C: 032] Revert "mariadb: Depool db1055 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/340303 (owner: 10Jcrespo) [13:21:11] (03CR) 10Hashar: "PPC https://puppet-compiler.wmflabs.org/5599/ There a few hosts failures but that seems unrelated." [puppet] - 10https://gerrit.wikimedia.org/r/337411 (owner: 10Hashar) [13:22:13] (03Merged) 10jenkins-bot: Revert "mariadb: Depool db1055 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/340303 (owner: 10Jcrespo) [13:22:53] (03CR) 10jenkins-bot: Revert "mariadb: Depool db1055 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/340303 (owner: 10Jcrespo) [13:22:57] jouncebot: next [13:22:57] In 0 hour(s) and 37 minute(s): European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170228T1400) [13:23:19] !log jynus@tin Synchronized wmf-config/db-eqiad.php: Repool db1055 after maintenance (duration: 00m 39s) [13:23:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:25:49] PROBLEM - puppet last run on prometheus2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [13:30:19] RECOVERY - puppet last run on mw1287 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [13:38:09] PROBLEM - carbon-local-relay metric drops on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [100.0] [13:38:49] PROBLEM - carbon-local-relay metric drops on graphite2001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [100.0] [13:39:49] RECOVERY - carbon-local-relay metric drops on graphite2001 is OK: OK: Less than 1.00% above the threshold [25.0] [13:40:09] RECOVERY - carbon-local-relay metric drops on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [13:42:39] PROBLEM - puppet last run on darmstadtium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [13:43:39] PROBLEM - Check Varnish expiry mailbox lag on cp1074 is CRITICAL: CRITICAL: expiry mailbox lag is 175194 [13:49:43] jouncebot: next [13:49:43] In 0 hour(s) and 10 minute(s): European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170228T1400) [13:49:49] jouncebot: now [13:49:49] No deployments scheduled for the next 0 hour(s) and 10 minute(s) [13:51:22] phuedx: do you want to deploy your commit yourself? cc hashar [13:51:42] (03PS3) 10Zfilipin: Make Page Previews use RESTBase on Beta Cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/339475 (https://phabricator.wikimedia.org/T156800) (owner: 10Phuedx) [13:52:09] phuedx: for beta cluster, you can just CR+2 and rebase on tin :} [13:52:14] no need to SWAT them imho [13:52:38] hashar: isn't there a scap? [13:52:48] (it's been a while since i've been on this side) [13:53:29] phuedx: jenkins will run a scap on beta [13:53:37] neato [13:53:48] duh [13:53:49] RECOVERY - puppet last run on prometheus2001 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [13:53:51] 'course it does [13:53:53] sorry [13:53:54] hashar: I've just added a patch to swat for milimetric [13:53:59] brain's not working [13:54:01] But I'm happy to deploy it [13:54:35] @Reedy sure, go ahead as soon as the swat starts? [13:54:45] WFM :) [13:54:59] (03PS2) 10Ottomata: Configure analytics cluster nodes to use thirdparty/cloudera apt component [puppet] - 10https://gerrit.wikimedia.org/r/337877 (https://phabricator.wikimedia.org/T155726) [13:55:02] elukey: ^ [13:55:52] (03CR) 10jerkins-bot: [V: 04-1] Configure analytics cluster nodes to use thirdparty/cloudera apt component [puppet] - 10https://gerrit.wikimedia.org/r/337877 (https://phabricator.wikimedia.org/T155726) (owner: 10Ottomata) [13:56:08] oo [13:56:23] running pcc [13:56:30] was about to as well but got that -1 [13:56:31] (03CR) 10Muehlenhoff: [C: 031] Configure analytics cluster nodes to use thirdparty/cloudera apt component [puppet] - 10https://gerrit.wikimedia.org/r/337877 (https://phabricator.wikimedia.org/T155726) (owner: 10Ottomata) [13:56:37] oh its lint [13:57:06] (03PS3) 10Ottomata: Configure analytics cluster nodes to use thirdparty/cloudera apt component [puppet] - 10https://gerrit.wikimedia.org/r/337877 (https://phabricator.wikimedia.org/T155726) [13:57:45] !log running alter table on db1036 T147747 [13:57:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:59:24] self +2ing beta cluster config changes isn't a faux pas? ;) [14:00:04] addshore, hashar, anomie, ostriches, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, and thcipriani: Respected human, time to deploy European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170228T1400). Please do the needful. [14:00:05] phuedx: A patch you scheduled for European Mid-day SWAT(Max 8 patches) is about to be deployed. Please be available during the process. [14:00:10] Hello. [14:00:13] phuedx: not really. Just needs merging on tin, and syncing for consistency [14:02:27] !log reedy@tin Synchronized php-1.29.0-wmf.13/extensions/Dashiki/extension.json: Register JsonConfigModels (duration: 00m 42s) [14:02:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:02:56] (03CR) 10Phuedx: [C: 032] Make Page Previews use RESTBase on Beta Cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/339475 (https://phabricator.wikimedia.org/T156800) (owner: 10Phuedx) [14:03:06] phuedx: the -labs.php files are not processed on production [14:03:11] so that is essentially a noop for prod [14:03:28] (03CR) 10Hashar: [C: 031] Make Page Previews use RESTBase on Beta Cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/339475 (https://phabricator.wikimedia.org/T156800) (owner: 10Phuedx) [14:04:11] (03Merged) 10jenkins-bot: Make Page Previews use RESTBase on Beta Cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/339475 (https://phabricator.wikimedia.org/T156800) (owner: 10Phuedx) [14:04:25] (03CR) 10jenkins-bot: Make Page Previews use RESTBase on Beta Cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/339475 (https://phabricator.wikimedia.org/T156800) (owner: 10Phuedx) [14:04:43] Reedy: will you deploy the Dashiki change or should I? [14:04:52] hashar: It's already done ;P [14:04:56] ;} [14:05:01] rebased on tin [14:05:28] scap sync-file? [14:07:14] phuedx: yes [14:07:27] though since it is a noop to prod we usually don't bother syncing them :} [14:07:56] (or at least **I** don't bother syncing -labs.php files) [14:08:04] !log phuedx@tin Synchronized wmf-config/InitialiseSettings-labs.php: Make Page Previews use RESTBase on Beta Cluster (duration: 00m 42s) [14:08:06] (03CR) 10Ottomata: [C: 032] Configure analytics cluster nodes to use thirdparty/cloudera apt component [puppet] - 10https://gerrit.wikimedia.org/r/337877 (https://phabricator.wikimedia.org/T155726) (owner: 10Ottomata) [14:08:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:08:20] it's been a while since i've run scap -- it's less noisy ;) [14:09:07] magic [14:09:24] hashar: put the wand down :P [14:09:39] RECOVERY - puppet last run on darmstadtium is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [14:11:26] (03CR) 10Ottomata: Only pipe /v2/stream requests to EventStreams service, everything else can be cached by varnish (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/340246 (https://phabricator.wikimedia.org/T158066) (owner: 10Ottomata) [14:14:16] Reedy: The Schema: NS is used for event logging, but the Config: for what exactly? [14:14:30] Configuration [14:14:31] Apparently [14:15:17] Reedy, yes - but for what extension/tool or it is just iniversal to store json stuff (just want to know for the docs) [14:15:29] It seems to be [14:16:46] * Steinsplitter notes [14:17:08] hashar: I imagine syncing the labs config files achieves the following promise: files deployed on the cluster = files on the repo + security fixes [14:23:53] (03PS1) 10Jcrespo: Revert "mariadb: Depool db1053 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/340312 [14:24:01] (03PS2) 10Jcrespo: Revert "mariadb: Depool db1053 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/340312 [14:25:49] (03PS2) 10Hashar: Enable rspec testing in Jenkins [puppet] - 10https://gerrit.wikimedia.org/r/340186 (https://phabricator.wikimedia.org/T78342) [14:29:16] (03CR) 10jerkins-bot: [V: 04-1] Enable rspec testing in Jenkins [puppet] - 10https://gerrit.wikimedia.org/r/340186 (https://phabricator.wikimedia.org/T78342) (owner: 10Hashar) [14:32:29] !log run pt-table-checksum on eowiki (s2) - T154485 [14:32:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:32:34] T154485: run pt-table-checksum before decommissioning db1015, db1035,db1044,db1038 - https://phabricator.wikimedia.org/T154485 [14:34:03] is swat done? [14:34:32] Starting to upgrade the analytics cluster, everything is silenced but if you see weird errors please ping me [14:34:43] I only see 2 patches [14:34:46] so probably yes [14:35:00] !log start the Analytics Hadoop cluster upgrade (https://etherpad.wikimedia.org/p/analytics-cdh5.10) [14:35:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:35:21] Dereckson and me are running namespaceDupes on terbium but I don't think that'll conflict? Please advice. [14:35:32] well, he is, I can't do that [14:37:05] if it takes some time, please add it to the top of https://wikitech.wikimedia.org/wiki/Deployments#Week_of_February_27th better be sure [14:37:49] PROBLEM - carbon-local-relay metric drops on graphite2001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [100.0] [14:38:00] (03CR) 10Jcrespo: [C: 032] Revert "mariadb: Depool db1053 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/340312 (owner: 10Jcrespo) [14:38:09] PROBLEM - carbon-local-relay metric drops on graphite1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [100.0] [14:39:08] (03Merged) 10jenkins-bot: Revert "mariadb: Depool db1053 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/340312 (owner: 10Jcrespo) [14:39:16] (03CR) 10jenkins-bot: Revert "mariadb: Depool db1053 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/340312 (owner: 10Jcrespo) [14:39:49] RECOVERY - carbon-local-relay metric drops on graphite2001 is OK: OK: Less than 1.00% above the threshold [25.0] [14:41:09] RECOVERY - carbon-local-relay metric drops on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [14:46:35] !log jynus@tin Synchronized wmf-config/db-eqiad.php: Repool db1053 after maintenance (duration: 00m 39s) [14:46:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:51:14] !log gehel@puppetmaster1001 conftool action : set/pooled=yes; selector: name=wdqs1003.eqiad.wmnet [14:51:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:53:19] (03PS1) 10Jcrespo: mariadb: Depool db1056 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/340319 (https://phabricator.wikimedia.org/T125885) [14:56:53] (03CR) 10Subramanya Sastry: "wrong bug id?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/340319 (https://phabricator.wikimedia.org/T125885) (owner: 10Jcrespo) [14:57:09] PROBLEM - Disk space on notebook1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [14:57:19] PROBLEM - Disk space on notebook1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [14:59:32] 06Operations: Restructure our internal repositories further - https://phabricator.wikimedia.org/T158583#3041135 (10fgiunchedi) Thanks @MoritzMuehlenhoff for the proposal! One thing I've struggled in the past is for a given package where to upload it, in particular for internally-packaged software this informatio... [15:02:01] 06Operations, 06Performance-Team, 05DC-Switchover-Prep-Q3-2016-17, 07Epic, and 2 others: Prepare a reasonably performant warmup tool for MediaWiki caches (memcached/apc) - https://phabricator.wikimedia.org/T156922#3061198 (10Joe) I took what @Krinkle did in his patchset, fixed a couple of things in order t... [15:04:29] PROBLEM - puppet last run on sca1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:07:24] (03CR) 10Jcrespo: "Yes, sorry for the noise." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/340319 (https://phabricator.wikimedia.org/T125885) (owner: 10Jcrespo) [15:08:09] RECOVERY - Disk space on notebook1002 is OK: DISK OK [15:08:09] RECOVERY - Disk space on notebook1001 is OK: DISK OK [15:08:33] (03CR) 10BBlack: [WIP] DNS: service discovery (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/331789 (https://phabricator.wikimedia.org/T156100) (owner: 10BBlack) [15:09:47] (03PS2) 10Jcrespo: mariadb: Depool db1056 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/340319 (https://phabricator.wikimedia.org/T147747) [15:12:31] (03PS1) 10Filippo Giunchedi: Add oresrdb2001 VM [dns] - 10https://gerrit.wikimedia.org/r/340323 (https://phabricator.wikimedia.org/T159207) [15:16:29] PROBLEM - puppet last run on copper is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[docker-engine] [15:20:21] (03PS2) 10Filippo Giunchedi: Add oresrdb2001 VM [dns] - 10https://gerrit.wikimedia.org/r/340323 (https://phabricator.wikimedia.org/T159207) [15:20:26] (03CR) 10Jcrespo: [C: 032] mariadb: Depool db1056 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/340319 (https://phabricator.wikimedia.org/T147747) (owner: 10Jcrespo) [15:20:29] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "A few corrections needed." (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/331789 (https://phabricator.wikimedia.org/T156100) (owner: 10BBlack) [15:22:43] (03Merged) 10jenkins-bot: mariadb: Depool db1056 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/340319 (https://phabricator.wikimedia.org/T147747) (owner: 10Jcrespo) [15:22:55] (03CR) 10jenkins-bot: mariadb: Depool db1056 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/340319 (https://phabricator.wikimedia.org/T147747) (owner: 10Jcrespo) [15:23:50] !log jynus@tin Synchronized wmf-config/db-eqiad.php: Depool db1056 for maintenance (duration: 00m 40s) [15:23:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:26:39] 06Operations, 10Ops-Access-Requests, 06Discovery, 06Maps: Give Max Semenik deployment rights for Maps - https://phabricator.wikimedia.org/T158820#3061355 (10mark) >>! In T158820#3058840, @RobH wrote: > So there wasn't an ops meeting today, but I've emailed our team about processing this request. When ther... [15:29:48] !log gehel@puppetmaster1001 conftool action : set/pooled=no; selector: name=wdqs1001.eqiad.wmnet [15:29:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:30:12] !log depooling wdqs1001 due to instability [15:30:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:32:29] RECOVERY - puppet last run on sca1004 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [15:39:00] 06Operations, 06Performance-Team, 05DC-Switchover-Prep-Q3-2016-17, 07Epic, and 3 others: Prepare a reasonably performant warmup tool for MediaWiki caches (memcached/apc) - https://phabricator.wikimedia.org/T156922#3061405 (10Joe) [15:39:09] PROBLEM - Check Varnish expiry mailbox lag on cp1072 is CRITICAL: CRITICAL: expiry mailbox lag is 179191 [15:46:05] (03PS3) 10Giuseppe Lavagetto: [WIP] mediawiki: Add cache-warmup to maintenance [puppet] - 10https://gerrit.wikimedia.org/r/339802 (https://phabricator.wikimedia.org/T156922) (owner: 10Krinkle) [15:47:59] !log running alter table on db1056 T147747 [15:48:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:50:27] (03CR) 10Paladox: [C: 031] "Tested on https://gerrit-new.wmflabs.org/ and https://phab-01.wmflabs.org/T20 :)" [puppet] - 10https://gerrit.wikimedia.org/r/340286 (https://phabricator.wikimedia.org/T159202) (owner: 10Paladox) [15:51:34] (03PS1) 10Hashar: elasticsearch: fix spec for GC settings [puppet] - 10https://gerrit.wikimedia.org/r/340325 [15:53:14] (03PS3) 10Hashar: Enable rspec testing in Jenkins [puppet] - 10https://gerrit.wikimedia.org/r/340186 (https://phabricator.wikimedia.org/T78342) [15:53:16] (03CR) 10Paladox: [C: 031] "This will only work on open patches, i will try and do this for merges later." [puppet] - 10https://gerrit.wikimedia.org/r/340286 (https://phabricator.wikimedia.org/T159202) (owner: 10Paladox) [15:53:37] (03CR) 10Ema: [C: 031] Add oresrdb2001 VM [dns] - 10https://gerrit.wikimedia.org/r/340323 (https://phabricator.wikimedia.org/T159207) (owner: 10Filippo Giunchedi) [15:53:44] (03CR) 10Gehel: [C: 032] elasticsearch: fix spec for GC settings [puppet] - 10https://gerrit.wikimedia.org/r/340325 (owner: 10Hashar) [15:53:53] (03CR) 10Filippo Giunchedi: [C: 032] Add oresrdb2001 VM [dns] - 10https://gerrit.wikimedia.org/r/340323 (https://phabricator.wikimedia.org/T159207) (owner: 10Filippo Giunchedi) [15:54:09] gehel: that was fast :-} [15:54:36] hashar: it's an easy review... and I know I have made you wait too much on a few others... [15:55:33] the whole rspec thing, I think I am almost done. Will probably announce it to ops list this week [15:57:16] (03PS1) 10Gehel: Give Max Semenik deployment rights for Maps [puppet] - 10https://gerrit.wikimedia.org/r/340327 (https://phabricator.wikimedia.org/T158820) [15:57:43] gehel: there is another one related to base::certificate that moved from 'base' module to 'profile' module. https://gerrit.wikimedia.org/r/#/c/340222/ but I am not sure what the test was doing in the first place. I made it pass but I am not sure the original intent is respected :( [15:58:14] hashar: having a look right now (and yes, you do well to ping me again!) [15:58:33] I ended up having to add the rspec-puppet boilerplate to profile [15:58:45] just to be able to run that single test. It is not perfect but good enough to run that single test :} [15:59:55] small steps will get us there (eventually) [16:01:26] hopefully [16:01:57] (03Draft1) 10Paladox: Gerrit: Fix it so that if you are the author and uploader it will only show By: [puppet] - 10https://gerrit.wikimedia.org/r/340326 [16:02:01] (03PS2) 10Paladox: Gerrit: Fix it so that if you are the author and uploader it will only show By: [puppet] - 10https://gerrit.wikimedia.org/r/340326 [16:02:20] (03PS2) 10Gehel: base/profile: fix spec for base::certificate [puppet] - 10https://gerrit.wikimedia.org/r/340222 (owner: 10Hashar) [16:03:20] (03PS3) 10Paladox: Gerrit: Fix it so that if you are the author and uploader it will only show By: [puppet] - 10https://gerrit.wikimedia.org/r/340326 (https://phabricator.wikimedia.org/T76291) [16:04:02] (03CR) 10Gehel: [C: 031] "LGTM, the initial intent is kept. The main intent of that test was to ensure that dependencies of `profile::base::certificates` are clearl" [puppet] - 10https://gerrit.wikimedia.org/r/340222 (owner: 10Hashar) [16:04:41] gehel: something I found is we can add a description the "it" statements :-} [16:04:58] that saves us from having to add an extra describe and a level of indentation (eg https://gerrit.wikimedia.org/r/#/c/340222/2/modules/profile/spec/classes/profile_base_certificate_spec.rb ) [16:05:10] * gehel does not like this cucumber like syntax... [16:05:22] (03CR) 10Gehel: [C: 032] base/profile: fix spec for base::certificate [puppet] - 10https://gerrit.wikimedia.org/r/340222 (owner: 10Hashar) [16:05:44] blame puppet/ruby world :} [16:05:57] (03CR) 10Paladox: [C: 031] "Tested with an admin account and user account on https://gerrit-new.wmflabs.org/ and works :)" [puppet] - 10https://gerrit.wikimedia.org/r/340326 (https://phabricator.wikimedia.org/T76291) (owner: 10Paladox) [16:06:13] * gehel blames hipster driven development [16:07:05] * hashar rigole en se roulant par terre [16:09:09] RECOVERY - Check Varnish expiry mailbox lag on cp1072 is OK: OK: expiry mailbox lag is 9471 [16:10:20] RainbowSprinkles, according to https://gerrit-review.googlesource.com/#/c/96370/ passwords were not hashed, does that mean anyone with access to gerrit's db can access our password? [16:15:24] (03PS1) 10Ottomata: Add flume-ng to cloudera component whitelist [puppet] - 10https://gerrit.wikimedia.org/r/340328 (https://phabricator.wikimedia.org/T155726) [16:17:30] (03CR) 10Elukey: [C: 031] Add flume-ng to cloudera component whitelist [puppet] - 10https://gerrit.wikimedia.org/r/340328 (https://phabricator.wikimedia.org/T155726) (owner: 10Ottomata) [16:20:33] (03CR) 10Ottomata: [C: 032] Add flume-ng to cloudera component whitelist [puppet] - 10https://gerrit.wikimedia.org/r/340328 (https://phabricator.wikimedia.org/T155726) (owner: 10Ottomata) [16:28:29] 06Operations, 10Domains, 10Education-Program-Dashboard, 10Traffic: Create short link for outreachdashboard.wmflabs.org - https://phabricator.wikimedia.org/T146332#2657740 (10Dzahn) I think dash.wmflabs.org would be no problem at all, but dash.wikimedia.org might be because it implies a production service a... [16:28:54] 06Operations, 10MediaWiki-General-or-Unknown, 06Multimedia: Segmentation fault creating thumbnail - https://phabricator.wikimedia.org/T159242#3061533 (10ema) [16:31:39] PROBLEM - puppet last run on mc1024 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:33:39] RECOVERY - Check Varnish expiry mailbox lag on cp1074 is OK: OK: expiry mailbox lag is 488 [16:53:13] (03PS1) 10Ema: vcl: stale-while-revalidate and full grace on sick backends [puppet] - 10https://gerrit.wikimedia.org/r/340335 [16:57:29] PROBLEM - puppet last run on wtp1008 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:58:39] RECOVERY - puppet last run on mc1024 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [17:00:04] godog, moritzm, and _joe_: Dear anthropoid, the time has come. Please deploy Puppet SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170228T1700). [17:00:04] hashar: A patch you scheduled for Puppet SWAT(Max 8 patches) is about to be deployed. Please be available during the process. [17:05:24] I got a few patches for the puppet swat if one is available for them [17:05:25] 3 should be straightforward, 1 has potential impact to prod so might want to skip it fornow [17:05:25] !log restarting blazegraph on wdqs1001 - T159245 [17:05:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:05:25] T159245: wdqs1001 and wdqs1003 unresponsive - https://phabricator.wikimedia.org/T159245 [17:05:25] not me today [17:05:26] !log gehel@puppetmaster1001 conftool action : set/pooled=yes; selector: name=wdqs1001.eqiad.wmnet [17:05:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:05:58] 06Operations, 10Domains, 10Education-Program-Dashboard, 10Traffic: Create short link for outreachdashboard.wmflabs.org - https://phabricator.wikimedia.org/T146332#3061658 (10Vojtech.dostal) dash.wmflabs.org is a bit awkward. could the WMF education team register a completely different domain of first ord... [17:09:44] !log disabling replication lag alerts on db1026 (depooled) [17:09:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:10:25] jynus: db1026 will get delayed by your alter or is it an online one? [17:10:31] it is online [17:10:38] oki [17:10:40] but it seems s5 activity is high now [17:10:49] and there is not control for it because it is depooled [17:11:03] !log Analytics Hadoop cluster upgraded to CDH 5.10 [17:11:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:12:10] elukey: \o/ [17:12:26] paravoid: next step is all on Jessie :) [17:13:03] Manu: there is some worrying query activity on enwiki, but it is api servers, so not related to my changes [17:13:31] all from a single server, so it may not be mediawiki, but on db1066 [17:14:47] let's see [17:21:34] don't worry too much [17:21:44] I was just noting some trends [17:22:01] 06Operations, 06Discovery, 06Discovery-Search, 10Wikidata, 10Wikidata-Query-Service: collect usual GC metrics for Blazegraph JVMs - https://phabricator.wikimedia.org/T159248#3061716 (10Gehel) [17:22:04] I was checking storage and graphs compared to db1072, but if it was a real big problem I would assume we'd be seen some lag [17:22:09] 06Operations, 06Discovery, 06Discovery-Search, 10Wikidata, 10Wikidata-Query-Service: collect usual GC metrics for Blazegraph JVMs - https://phabricator.wikimedia.org/T159248#3061729 (10Gehel) p:05Triage>03High [17:22:22] no, lag is not a problem [17:22:29] PROBLEM - High lag on wdqs1001 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [1800.0] [17:22:30] running out of connections, though... [17:22:40] that is why I pooled 3 servers for enwiki [17:22:59] yes, looking at the latency looks like it is struggling a bit to be fast [17:23:02] the other seem cool, though [17:23:04] compared with the other hosts [17:23:07] yes, the others look fine [17:23:12] I am glad we have db1072 :p [17:23:17] so just taking a mental note [17:23:24] to check it at some point [17:23:58] I am going to log off I think, if you need anything, please call [17:24:02] I think there is a ticker for that [17:24:04] same here [17:24:10] :-) [17:24:13] see you tomorrow! [17:25:29] RECOVERY - puppet last run on wtp1008 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [17:26:29] PROBLEM - High lag on wdqs1001 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [1800.0] [17:27:26] ACKNOWLEDGEMENT - High lag on wdqs1001 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [1800.0] Gehel updates are catching up after blazegraph being frozen - T159245 [17:32:15] 06Operations, 10Ops-Access-Requests, 06Discovery, 06Maps, 13Patch-For-Review: Give Max Semenik deployment rights for Maps - https://phabricator.wikimedia.org/T158820#3061752 (10Gehel) [17:32:38] 06Operations, 10Ops-Access-Requests, 06Discovery, 06Maps, 13Patch-For-Review: Give Max Semenik deployment rights for Maps - https://phabricator.wikimedia.org/T158820#3048354 (10Gehel) Since @mark approved, I ticked the last checkbox. [17:35:56] (03CR) 10Chad: [C: 031] Give Max Semenik deployment rights for Maps [puppet] - 10https://gerrit.wikimedia.org/r/340327 (https://phabricator.wikimedia.org/T158820) (owner: 10Gehel) [17:36:26] (03PS2) 10Gehel: Give Max Semenik deployment rights for Maps [puppet] - 10https://gerrit.wikimedia.org/r/340327 (https://phabricator.wikimedia.org/T158820) [17:39:23] (03CR) 10Gehel: [C: 032] Give Max Semenik deployment rights for Maps [puppet] - 10https://gerrit.wikimedia.org/r/340327 (https://phabricator.wikimedia.org/T158820) (owner: 10Gehel) [17:39:23] MaxSem: ^ [17:39:23] weeeee [17:39:23] thanks =) [17:39:40] np [17:40:54] hashar: sorry I missed puppet swat today :( [17:41:05] godog: no worries :} [17:41:12] hashar: I'm looking at your patches see if there's something I can merge now [17:41:25] there are a few that should be striaghtforward [17:41:42] one changes rsyslog.conf snippet so it probably needs some extra time. feel free to skip it [17:42:10] ok thanks [17:42:30] I am around for the next 17 minutes or os [17:42:33] 06Operations, 10Ops-Access-Requests, 13Patch-For-Review: Requesting shell access and access to groups 'analytics-privatedata-users' and 'researchers' for Shreyas Lakhtakia (shrlak) - https://phabricator.wikimedia.org/T158978#3061762 (10Jgreen) [17:43:03] hashar: hehe ok, I'll merge https://gerrit.wikimedia.org/r/#/c/282484 and https://gerrit.wikimedia.org/r/#/c/307223/ [17:43:30] (03PS18) 10Filippo Giunchedi: Rake helper to run rspec in all modules having specs [puppet] - 10https://gerrit.wikimedia.org/r/282484 (https://phabricator.wikimedia.org/T78342) (owner: 10Nicko) [17:43:37] godog: sounds good [17:43:55] !log Updating RESTBase mobileapps tables (wikipedia) to uses time-windowed compaction [17:43:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:44:00] all those rake / rspec changes I will craft some announcement / explanation to ops list soonish [17:44:51] 339176 systemd: add spec , might want a review from someone familiar with rspec tests. Though it is a noop for prod at least [17:45:45] (03CR) 10Filippo Giunchedi: [C: 032] Rake helper to run rspec in all modules having specs [puppet] - 10https://gerrit.wikimedia.org/r/282484 (https://phabricator.wikimedia.org/T78342) (owner: 10Nicko) [17:45:57] 06Operations, 10OCG-General, 10Reading-Community-Engagement, 06Reading-Web-Backlog, and 3 others: [EPIC] (Proposal) Replicate core OCG features and sunset OCG service - https://phabricator.wikimedia.org/T150871#3061768 (10JKatzWMF) [17:46:46] (03PS13) 10Filippo Giunchedi: Use rake tasks to run modules spec [puppet] - 10https://gerrit.wikimedia.org/r/307223 (owner: 10Hashar) [17:53:54] 06Operations, 10Revision-Scoring-As-A-Service-Backlog: Set up oresrdb redis node in codfw - https://phabricator.wikimedia.org/T139372#3061788 (10Halfak) @fgiunchedi, replication should work. How do you imagine that we'd configure the codfw workers to connect to redis? I imagine that they wouldn't be able to... [17:53:56] godog: looks like CI has completed on https://gerrit.wikimedia.org/r/#/c/307223/ :} [17:56:28] (03CR) 10Filippo Giunchedi: [C: 032] Use rake tasks to run modules spec [puppet] - 10https://gerrit.wikimedia.org/r/307223 (owner: 10Hashar) [17:56:52] hashar: thanks for the ping! merged [17:57:29] RECOVERY - High lag on wdqs1001 is OK: OK: Less than 30.00% above the threshold [600.0] [17:58:00] godog: thank you! the others (systemd related) can wait [17:58:09] will try to sink someone into reviewing them :} [18:00:04] gwicke, cscott, arlolra, subbu, halfak, and Amir1: Respected human, time to deploy Services – Graphoid / Parsoid / OCG / Citoid / ORES (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170228T1800). Please do the needful. [18:06:49] 06Operations, 06Discovery, 06Discovery-Search, 10Wikidata, 10Wikidata-Query-Service: collect usual GC metrics for Blazegraph JVMs - https://phabricator.wikimedia.org/T159248#3061716 (10Smalyshev) Where can I see the ES example? [18:07:26] 06Operations, 06Discovery, 06Discovery-Search (Current work): Add elasticsearch 5 .deb to reprepro experimental repository - https://phabricator.wikimedia.org/T159168#3061863 (10EBernhardson) @gehel we'll want to do this one sooner than later, but looking through the puppet code i'm completely unsure of what... [18:08:43] 06Operations, 06Discovery, 06Discovery-Search (Current work): Add elasticsearch 5 .deb to reprepro experimental repository - https://phabricator.wikimedia.org/T159168#3061865 (10Gehel) The base documentation is probably https://wikitech.wikimedia.org/wiki/Reprepro, but I'll need some time to dig into it and... [18:19:48] (03PS1) 10Chad: group0 to wmf.14 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/340344 [18:20:58] (03CR) 10Chad: [C: 04-2] "4 l8r homies" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/340344 (owner: 10Chad) [18:22:52] (03PS1) 10Chad: scap clean: Forgot git import [mediawiki-config] - 10https://gerrit.wikimedia.org/r/340345 [18:23:49] (03PS1) 10Muehlenhoff: Script for offboarding a user (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/340346 [18:24:10] (03PS2) 10Chad: scap clean: Forgot git/utils imports [mediawiki-config] - 10https://gerrit.wikimedia.org/r/340345 [18:24:24] (03CR) 10Chad: [C: 032] scap clean: Forgot git/utils imports [mediawiki-config] - 10https://gerrit.wikimedia.org/r/340345 (owner: 10Chad) [18:27:43] moritzm: let me know when it's not anymore WIP and ready for a review ;) ^^^ [18:28:04] will do :-) [18:28:07] probably tomorrow [18:31:37] !log demon@tin Started scap: testwiki to wmf.14 + l10n bootstrap [18:31:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:32:40] (03CR) 10Gergő Tisza: [C: 031] "Thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/340286 (https://phabricator.wikimedia.org/T159202) (owner: 10Paladox) [18:33:10] (03CR) 10Paladox: [C: 031] "Your welcome :)" [puppet] - 10https://gerrit.wikimedia.org/r/340286 (https://phabricator.wikimedia.org/T159202) (owner: 10Paladox) [18:33:48] 06Operations, 10Domains, 10Education-Program-Dashboard, 10Traffic: Create short link for outreachdashboard.wmflabs.org - https://phabricator.wikimedia.org/T146332#3061963 (10Dzahn) What about the option of using dash.wikimedia.org (or outreach.wikimedia.org?) but also actually moving your service into prod... [18:35:31] (03PS2) 10Ema: vcl: stale-while-revalidate and full grace on sick backends [puppet] - 10https://gerrit.wikimedia.org/r/340335 [18:35:37] (03Merged) 10jenkins-bot: scap clean: Forgot git/utils imports [mediawiki-config] - 10https://gerrit.wikimedia.org/r/340345 (owner: 10Chad) [18:37:28] (03CR) 10jenkins-bot: scap clean: Forgot git/utils imports [mediawiki-config] - 10https://gerrit.wikimedia.org/r/340345 (owner: 10Chad) [18:38:41] (03PS3) 10Dzahn: Gerrit: Make gerritbot report the repo in the comment [puppet] - 10https://gerrit.wikimedia.org/r/340286 (https://phabricator.wikimedia.org/T159202) (owner: 10Paladox) [18:39:34] (03CR) 10jerkins-bot: [V: 04-1] Script for offboarding a user (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/340346 (owner: 10Muehlenhoff) [18:40:58] 06Operations, 10Domains, 10Education-Program-Dashboard, 10Traffic: Create short link for outreachdashboard.wmflabs.org - https://phabricator.wikimedia.org/T146332#3061975 (10Ragesoss) @Dzahn I would love to move it production. I understand that it's a more complicated thing than many of those "microservice... [18:42:09] PROBLEM - citoid endpoints health on scb1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [18:42:19] PROBLEM - zotero on sca1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:42:49] PROBLEM - citoid endpoints health on scb1004 is CRITICAL: /api (Zotero alive) is CRITICAL: Test Zotero alive returned the unexpected status 404 (expecting: 200) [18:42:59] RECOVERY - citoid endpoints health on scb1001 is OK: All endpoints are healthy [18:43:09] RECOVERY - zotero on sca1004 is OK: HTTP OK: HTTP/1.0 200 OK - 62 bytes in 0.006 second response time [18:43:49] RECOVERY - citoid endpoints health on scb1004 is OK: All endpoints are healthy [18:46:30] jenkins-bot is too slow [18:49:19] (03PS3) 10Ema: vcl: grace, keep and expired TTLs [puppet] - 10https://gerrit.wikimedia.org/r/340335 [18:55:35] (03CR) 10Dzahn: [C: 032] Gerrit: Make gerritbot report the repo in the comment [puppet] - 10https://gerrit.wikimedia.org/r/340286 (https://phabricator.wikimedia.org/T159202) (owner: 10Paladox) [18:57:19] (03CR) 10Dzahn: [C: 031] "@Paladox lgtm, could you just do a rebase please? it needs manual one because i merged your other change first. thanks" [puppet] - 10https://gerrit.wikimedia.org/r/340326 (https://phabricator.wikimedia.org/T76291) (owner: 10Paladox) [18:57:50] (03PS3) 10Dzahn: phabricator: include project creations with policies other than public+all-users [puppet] - 10https://gerrit.wikimedia.org/r/317990 (owner: 10Alex Monk) [19:00:50] (03PS4) 10Paladox: Gerrit: Fix it so that if you are the author and uploader it will only show By: [puppet] - 10https://gerrit.wikimedia.org/r/340326 (https://phabricator.wikimedia.org/T76291) [19:01:01] (03PS5) 10Paladox: Gerrit: Fix it so that if you are the author and uploader it will only show By: [puppet] - 10https://gerrit.wikimedia.org/r/340326 (https://phabricator.wikimedia.org/T76291) [19:04:52] !log Updating RESTBase mobileapps tables (wikimedia) to uses time-windowed compaction [19:04:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:05:22] (03PS6) 10Paladox: Gerrit: Fix it so that if you are the author and uploader it will only show By: [puppet] - 10https://gerrit.wikimedia.org/r/340326 (https://phabricator.wikimedia.org/T76291) [19:06:19] PROBLEM - Check whether ferm is active by checking the default input chain on analytics1039 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly [19:07:19] RECOVERY - Check whether ferm is active by checking the default input chain on analytics1039 is OK: OK ferm input default policy is set [19:09:01] (03CR) 10Dzahn: [C: 032] phabricator: include project creations with policies other than public+all-users [puppet] - 10https://gerrit.wikimedia.org/r/317990 (owner: 10Alex Monk) [19:13:49] (03CR) 10Dzahn: "done. i have sent one test mail to aklapper@ and myself to check script is still working." [puppet] - 10https://gerrit.wikimedia.org/r/317990 (owner: 10Alex Monk) [19:14:39] (03CR) 10Dzahn: [C: 032] Gerrit: Fix it so that if you are the author and uploader it will only show By: [puppet] - 10https://gerrit.wikimedia.org/r/340326 (https://phabricator.wikimedia.org/T76291) (owner: 10Paladox) [19:14:41] (03PS4) 10Gehel: maps: make version of nodejs configurable [puppet] - 10https://gerrit.wikimedia.org/r/339670 [19:14:48] (03PS7) 10Dzahn: Gerrit: Fix it so that if you are the author and uploader it will only show By: [puppet] - 10https://gerrit.wikimedia.org/r/340326 (https://phabricator.wikimedia.org/T76291) (owner: 10Paladox) [19:15:10] (03CR) 10Gehel: maps: make version of nodejs configurable (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/339670 (owner: 10Gehel) [19:15:37] (03PS8) 10Dzahn: Gerrit: only show author and uploader if they are different people [puppet] - 10https://gerrit.wikimedia.org/r/340326 (https://phabricator.wikimedia.org/T76291) (owner: 10Paladox) [19:16:29] (03CR) 10Gehel: [C: 032] maps: make version of nodejs configurable [puppet] - 10https://gerrit.wikimedia.org/r/339670 (owner: 10Gehel) [19:16:29] PROBLEM - nova instance creation test on labnet1001 is CRITICAL: PROCS CRITICAL: 0 processes with command name python, args nova-fullstack [19:16:36] (03PS4) 10Paladox: Phabricator: Fix phd not starting up after reboot if it was previously stopped [puppet] - 10https://gerrit.wikimedia.org/r/340242 (https://phabricator.wikimedia.org/T158434) [19:16:58] (03PS5) 10Paladox: Phabricator: Fix phd not starting up after reboot if it was previously stopped [puppet] - 10https://gerrit.wikimedia.org/r/340242 (https://phabricator.wikimedia.org/T158434) [19:18:24] (03PS9) 10Dzahn: Gerrit: only show author and uploader if they are different people [puppet] - 10https://gerrit.wikimedia.org/r/340326 (https://phabricator.wikimedia.org/T76291) (owner: 10Paladox) [19:18:30] (03CR) 10Dzahn: [V: 032 C: 032] Gerrit: only show author and uploader if they are different people [puppet] - 10https://gerrit.wikimedia.org/r/340326 (https://phabricator.wikimedia.org/T76291) (owner: 10Paladox) [19:26:51] !log demon@tin Finished scap: testwiki to wmf.14 + l10n bootstrap (duration: 55m 14s) [19:26:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:28:44] RainbowSprinkles: Could I add https://gerrit.wikimedia.org/r/#/c/340350/ to the train please? It fixes a JS error [19:29:06] Train's already out, backport :) [19:29:11] That is a backport :) [19:29:16] Happy to deploy it myself if you approve [19:29:24] I can do it no worries [19:29:27] Cool thanks [19:29:31] +2 [19:29:56] Timo did everything right and merged a breaking change right after last week's train, but for some reason we didn't notice it broke something until 15 minutes ago [19:30:08] Sneaky sneaky [19:34:17] !log demon@tin Synchronized php-1.29.0-wmf.14/extensions/WikimediaEvents/modules/ext.wikimediaEvents.geoFeatures.js: Roan made me do it (duration: 00m 39s) [19:34:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:34:55] RoanKattouw: Done done [19:36:03] Thanks! [19:36:14] yw [19:36:29] RECOVERY - nova instance creation test on labnet1001 is OK: PROCS OK: 1 process with command name python, args nova-fullstack [19:37:19] PROBLEM - MariaDB Slave Lag: s2 on db1047 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 304.40 seconds [19:39:29] PROBLEM - nova instance creation test on labnet1001 is CRITICAL: PROCS CRITICAL: 0 processes with command name python, args nova-fullstack [19:48:41] 06Operations, 10Domains, 10Education-Program-Dashboard, 10Traffic: Create short link for outreachdashboard.wmflabs.org - https://phabricator.wikimedia.org/T146332#3062165 (10Dzahn) @Ragesoss I am willing to help with the productionizing. One thing to start with would be a list of requirements (software pac... [19:50:00] did db1047 complain? [19:50:13] 06Operations, 10Domains, 10Education-Program-Dashboard, 10Traffic: Create short link for outreachdashboard.wmflabs.org - https://phabricator.wikimedia.org/T146332#3062166 (10Dzahn) p:05Triage>03Normal [19:50:29] 06Operations, 10Continuous-Integration-Config, 13Patch-For-Review: Create a basic RSpec unit test for operations/puppet - https://phabricator.wikimedia.org/T78342#3062167 (10hashar) It has taken ages but finally we have some basic setup for running rspec-puppet tests. Next steps will be: * adding some more... [19:51:07] (03Abandoned) 10Hashar: Enable rspec testing in Jenkins [puppet] - 10https://gerrit.wikimedia.org/r/340186 (https://phabricator.wikimedia.org/T78342) (owner: 10Hashar) [19:52:09] yes, it did [19:52:39] 06Operations, 10Ops-Access-Requests, 06Discovery, 06Maps, 13Patch-For-Review: Give Max Semenik deployment rights for Maps - https://phabricator.wikimedia.org/T158820#3062173 (10Dzahn) 05Open>03Resolved a:03Dzahn [19:52:55] 06Operations, 10Ops-Access-Requests, 06Discovery, 06Maps, 13Patch-For-Review: Give Max Semenik deployment rights for Maps - https://phabricator.wikimedia.org/T158820#3048354 (10Dzahn) a:05Dzahn>03RobH [19:53:35] Battery State: Failed [19:54:22] wait, are there 2? [19:56:00] 06Operations, 10RESTBase, 10service-runner, 13Patch-For-Review, 06Services (doing): enable restbase syslog/file logging - https://phabricator.wikimedia.org/T112648#3062191 (10Pchelolo) I've put the patch for puppet SWAT on March 02. [20:00:04] RainbowSprinkles: Dear anthropoid, the time has come. Please deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170228T2000). [20:00:09] PROBLEM - check_puppetrun on lutetium is CRITICAL: CRITICAL: puppet fail [20:00:28] (03CR) 10Chad: [C: 032] group0 to wmf.14 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/340344 (owner: 10Chad) [20:03:04] (03Merged) 10jenkins-bot: group0 to wmf.14 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/340344 (owner: 10Chad) [20:03:26] (03CR) 10jenkins-bot: group0 to wmf.14 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/340344 (owner: 10Chad) [20:03:29] (03PS10) 10Fdans: Changes to perf consumer of event logging events [puppet] - 10https://gerrit.wikimedia.org/r/337158 (https://phabricator.wikimedia.org/T156760) (owner: 10Nuria) [20:03:50] !log demon@tin rebuilt wikiversions.php and synchronized wikiversions files: group0 to wmf.14 [20:03:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:05:09] PROBLEM - check_puppetrun on lutetium is CRITICAL: CRITICAL: puppet fail [20:08:17] RoanKattouw: Aye, oops :) [20:09:05] (03CR) 10jerkins-bot: [V: 04-1] Changes to perf consumer of event logging events [puppet] - 10https://gerrit.wikimedia.org/r/337158 (https://phabricator.wikimedia.org/T156760) (owner: 10Nuria) [20:09:06] RoanKattouw: RE: rIC failure [20:09:57] RoanKattouw: Not sure I understand why mw.riC(, number) would fail. Native throws, but mw.riC does not and we're not using native anywhere yet. That commit is still pending review in master. [20:10:09] PROBLEM - check_puppetrun on lutetium is CRITICAL: CRITICAL: puppet fail [20:10:20] Try mw.requestIdleCallback(function () {}, 10) in the console [20:13:37] jynus: do you have a moment for a question? [20:13:53] I was about to leave [20:13:59] is it easy? [20:14:04] Krinkle: No it was merged a week ago [20:14:09] or it will take long? [20:14:17] mw.riC went back to being a polyfill when needed only [20:14:19] PROBLEM - puppet last run on cp1049 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [20:14:23] if not, send an email, nuria [20:14:35] jynus: we can talk tomorrow, np. ciao [20:14:37] So on clients that have native riC, the polyfill is no longer used but the native function is used now [20:14:48] No, I drafted that commit a week ago [20:14:51] It was not merged 12 hours ago [20:15:08] It was merged 7 hours ago. [20:15:09] RECOVERY - check_puppetrun on lutetium is OK: OK: Puppet is currently enabled, last run 243 seconds ago with 0 failures [20:15:24] Anyway, good group0 catch. [20:15:36] Sorry, it was merged this morning, not 7 days ago, my bad [20:16:00] RainbowSprinkles: Turns out we can blame Gilles :) he merged Timo's scary commit a few hours before the cut [20:16:20] Yeah Moriel found it locally while the train was going out to group0 [20:16:21] RoanKattouw: rIC does have a timeout property on its options object parameter, however removing it here is hte right thing to do. [20:16:35] Yeah I mean that code is nested inside a setTimeout 1000 already [20:16:38] So it seemed superfluous [20:16:47] for WikimediaEvents, since it's low priority. Yeah [20:16:56] Good to know that a timeout prop exists though, I didn't realize that [20:17:22] RoanKattouw: it's not a timeout really, it's a max-time-to-pass before it won't wait for an idle gap to come 'round but instead just schedule it in the event loop. [20:17:29] I see [20:17:33] Oh that's useful [20:17:46] So for high priority stuff you want to give a chance to happen in a graceful way first [20:17:55] to avoid interupting scroll and click handlers which should be given priority [20:18:07] but not too long [20:18:49] So far I've not yet found a good use for it, but it's good to have. [20:25:50] (03PS1) 10Chad: Disable creation of new forms on wikitech [mediawiki-config] - 10https://gerrit.wikimedia.org/r/340354 [20:26:25] (03CR) 10Reedy: [C: 031] Disable creation of new forms on wikitech [mediawiki-config] - 10https://gerrit.wikimedia.org/r/340354 (owner: 10Chad) [20:27:01] (03CR) 10Brian Wolff: [C: 031] Disable creation of new forms on wikitech [mediawiki-config] - 10https://gerrit.wikimedia.org/r/340354 (owner: 10Chad) [20:27:27] (03CR) 10MaxSem: [C: 031] Disable creation of new forms on wikitech [mediawiki-config] - 10https://gerrit.wikimedia.org/r/340354 (owner: 10Chad) [20:27:52] (03CR) 10Chad: [C: 032] Disable creation of new forms on wikitech [mediawiki-config] - 10https://gerrit.wikimedia.org/r/340354 (owner: 10Chad) [20:29:01] (03Merged) 10jenkins-bot: Disable creation of new forms on wikitech [mediawiki-config] - 10https://gerrit.wikimedia.org/r/340354 (owner: 10Chad) [20:29:09] (03CR) 10jenkins-bot: Disable creation of new forms on wikitech [mediawiki-config] - 10https://gerrit.wikimedia.org/r/340354 (owner: 10Chad) [20:30:16] !log demon@tin Synchronized wmf-config/wikitech.php: no moar forms on wikitech (duration: 00m 39s) [20:30:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:31:48] (03CR) 10BBlack: [C: 031] vcl: grace, keep and expired TTLs [puppet] - 10https://gerrit.wikimedia.org/r/340335 (owner: 10Ema) [20:33:05] !log Updating RESTBase mobileapps tables (phase0) to use time-windowed compaction [20:33:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:34:14] (03PS1) 10Filippo Giunchedi: install_server: add oresrdb2001 [puppet] - 10https://gerrit.wikimedia.org/r/340355 (https://phabricator.wikimedia.org/T159207) [20:34:57] (03CR) 10BBlack: [C: 031] Only pipe /v2/stream requests to EventStreams service, everything else can be cached by varnish (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/340246 (https://phabricator.wikimedia.org/T158066) (owner: 10Ottomata) [20:35:06] (03PS2) 10Filippo Giunchedi: install_server: add oresrdb2001 [puppet] - 10https://gerrit.wikimedia.org/r/340355 (https://phabricator.wikimedia.org/T159207) [20:36:25] (03PS3) 10Tim Landscheidt: Tools: Fix and simplify exim redirectors [puppet] - 10https://gerrit.wikimedia.org/r/148917 [20:39:58] (03CR) 10Filippo Giunchedi: [C: 032] install_server: add oresrdb2001 [puppet] - 10https://gerrit.wikimedia.org/r/340355 (https://phabricator.wikimedia.org/T159207) (owner: 10Filippo Giunchedi) [20:41:18] (03CR) 10Tim Landscheidt: "This is just a rebase to work with the new invocation "localuser tools scfc" instead of "localuser scfc"; @yuvipanda's, @Coren's and @para" [puppet] - 10https://gerrit.wikimedia.org/r/148917 (owner: 10Tim Landscheidt) [20:42:19] RECOVERY - puppet last run on cp1049 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [20:44:45] (03PS1) 10MaxSem: Disable SemanticForms API, slowwww [mediawiki-config] - 10https://gerrit.wikimedia.org/r/340357 [20:46:02] (03CR) 10Reedy: [C: 031] Disable SemanticForms API, slowwww [mediawiki-config] - 10https://gerrit.wikimedia.org/r/340357 (owner: 10MaxSem) [20:47:14] (03PS2) 10MaxSem: Disable SemanticForms API, slowwww [mediawiki-config] - 10https://gerrit.wikimedia.org/r/340357 [20:47:34] (03CR) 10MaxSem: [C: 032] Disable SemanticForms API, slowwww [mediawiki-config] - 10https://gerrit.wikimedia.org/r/340357 (owner: 10MaxSem) [20:48:54] (03Merged) 10jenkins-bot: Disable SemanticForms API, slowwww [mediawiki-config] - 10https://gerrit.wikimedia.org/r/340357 (owner: 10MaxSem) [20:49:04] (03CR) 10jenkins-bot: Disable SemanticForms API, slowwww [mediawiki-config] - 10https://gerrit.wikimedia.org/r/340357 (owner: 10MaxSem) [20:50:18] !log maxsem@tin Synchronized wmf-config/wikitech.php: https://gerrit.wikimedia.org/r/#/c/340357/2 (duration: 00m 40s) [20:50:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:52:27] 06Operations, 10Domains, 10Education-Program-Dashboard, 10Traffic: Create short link for outreachdashboard.wmflabs.org - https://phabricator.wikimedia.org/T146332#3062482 (10Dzahn) @Ragesoss @dduvall Oops, is this the same thing that is suggested for deletion here? https://gerrit.wikimedia.org/r/#/c/340... [20:53:37] (03CR) 10Dzahn: "is this the same thing that is "outreachdashboard.wmflabs.org"?" [puppet] - 10https://gerrit.wikimedia.org/r/340164 (owner: 10Dduvall) [20:55:32] 06Operations, 10Domains, 10Education-Program-Dashboard, 10Traffic: Create short link for outreachdashboard.wmflabs.org - https://phabricator.wikimedia.org/T146332#3062487 (10Ragesoss) @Dzahn That was part of the groundwork I mentioned, but it isn't actively being worked on. I think the idea was to delete i... [20:56:45] 06Operations, 10Domains, 10Education-Program-Dashboard, 10Traffic: Create short link for outreachdashboard.wmflabs.org - https://phabricator.wikimedia.org/T146332#3062489 (10Ragesoss) @Dzahn: a quick glance, it's also out of date in terms of the requirements; among other changes, the project is on Ruby 2.3... [21:00:17] (03CR) 10Ragesoss: "> is this the same thing that is "outreachdashboard.wmflabs.org"?" [puppet] - 10https://gerrit.wikimedia.org/r/340164 (owner: 10Dduvall) [21:02:22] (03CR) 10Dzahn: "alright. yea, but on phabricator we were just talking about moving this service to production, right. So we should complete this module an" [puppet] - 10https://gerrit.wikimedia.org/r/340164 (owner: 10Dduvall) [21:03:17] (03PS4) 10Krinkle: [WIP] mediawiki: Add cache-warmup to maintenance [puppet] - 10https://gerrit.wikimedia.org/r/339802 (https://phabricator.wikimedia.org/T156922) [21:04:26] (03CR) 10Ragesoss: "> alright. yea, but on phabricator we were just talking about moving" [puppet] - 10https://gerrit.wikimedia.org/r/340164 (owner: 10Dduvall) [21:05:50] <_joe_> !log manually installing nodejs on wasat T156922 [21:05:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:05:57] T156922: Prepare a reasonably performant warmup tool for MediaWiki caches (memcached/apc) - https://phabricator.wikimedia.org/T156922 [21:05:59] <_joe_> Krinkle: done [21:06:29] <_joe_> Krinkle: not sure if any additional package is needed, but I think not [21:06:39] _joe_: nope [21:06:48] <_joe_> you just used modules from core AFAIR, right [21:07:21] (03CR) 10Dzahn: "yes, both labs and production use the same puppet roles/modules. it would be tested in labs and once done the same thing would just be app" [puppet] - 10https://gerrit.wikimedia.org/r/340164 (owner: 10Dduvall) [21:08:15] <_joe_> Krinkle: so the clone call should be like "nodejs warmup.js clone appservers codfw" [21:08:51] <_joe_> I tried it after wiping the caches, both local and global [21:11:19] _joe_: cool. I'm just gonna do a few nit picks and smaller scale testing to see if I can speed it up a little. [21:12:17] <_joe_> sure, I rogue-committed, my changes were pretty horribly written [21:12:44] Krinkle: (Possibly) dumb question...why node for this? I didn't review in depth, to be fair [21:12:44] <_joe_> truth is every time I have to use node's http/https for anything less than trivial I end up reading the code [21:13:07] <_joe_> RainbowSprinkles: because it's fairly easy to do concurrent http requests in node, I guess [21:13:17] <_joe_> it's like it's classical use-case :) [21:13:21] <_joe_> *its [21:13:42] Ok, fair enough I guess. I was just thinking "command line tool -- why not python?" :) [21:13:47] <_joe_> "it's like its" fooled me :| [21:14:19] <_joe_> RainbowSprinkles: because the best thing to do concurrent requests in python is requests, which AFAIR doesn't work well with async frameworks [21:14:47] <_joe_> I would've still picked python or go, which I don't despise as much as javascript :P [21:15:31] Python's twisted has async http, iirc. But yeah, not my tool I'm mostly just armchairing :) [21:16:06] google also points to tornado [21:16:11] https://stackoverflow.com/questions/2632520/what-is-the-fastest-way-to-send-100-000-http-requests-in-python :) [21:17:28] http://stackoverflow.com/a/25549675 - heh, 16 LOC [21:18:18] <_joe_> If I had to write all the boilerplate, I'd use threading [21:18:39] (03PS3) 10Dzahn: mariadb/prometheus: remove workaround for precise [puppet] - 10https://gerrit.wikimedia.org/r/337204 [21:18:58] Solution further up that page does it in 38 loc :p [21:19:06] (using Queue/Thread) [21:19:37] Anyway, if it works in node it works. Mostly just asking the half-trolling questions :) [21:19:59] (03CR) 10Dzahn: [C: 032] mariadb/prometheus: remove workaround for precise [puppet] - 10https://gerrit.wikimedia.org/r/337204 (owner: 10Dzahn) [21:22:06] (03PS5) 10Dzahn: contint: drop npm settings for precise [puppet] - 10https://gerrit.wikimedia.org/r/337203 (https://phabricator.wikimedia.org/T158652) [21:23:03] (03CR) 10Dzahn: [C: 032] contint: drop npm settings for precise [puppet] - 10https://gerrit.wikimedia.org/r/337203 (https://phabricator.wikimedia.org/T158652) (owner: 10Dzahn) [21:23:31] !log Completely disabled kartotherian on maps-test2004, it just logs errors [21:23:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:24:57] andrewbogott: can we remove the icinga check called ""tools-checker-grid-start-precise" ("Start a job and verify on Precise") per "precise" ? there is still the identical thing just with "trusty" https://gerrit.wikimedia.org/r/#/c/337207/2 [21:25:29] had a series of patches removing precise-specific stuff [21:27:54] (03PS5) 10Krinkle: [WIP] mediawiki: Add cache-warmup to maintenance [puppet] - 10https://gerrit.wikimedia.org/r/339802 (https://phabricator.wikimedia.org/T156922) [21:28:06] (03CR) 10Krinkle: "Consistent coding style." [puppet] - 10https://gerrit.wikimedia.org/r/339802 (https://phabricator.wikimedia.org/T156922) (owner: 10Krinkle) [21:30:18] <_joe_> Krinkle: line 35 [21:30:30] <_joe_> Krinkle: if (target), not if (!target) [21:30:57] Thx [21:30:58] (03PS6) 10Krinkle: [WIP] mediawiki: Add cache-warmup to maintenance [puppet] - 10https://gerrit.wikimedia.org/r/339802 (https://phabricator.wikimedia.org/T156922) [21:31:00] not yet tested :) [21:31:08] _joe_: npm install, npm test [21:31:15] or npm nit, if you use v3+ [21:31:21] npm it* [21:33:18] 06Operations, 10Annual-Report, 10Security-Reviews, 13Patch-For-Review: add subdomain for annual report 2016 - https://phabricator.wikimedia.org/T151798#3062637 (10Dzahn) 05Open>03Resolved [21:34:11] !log maxsem@tin Started deploy [kartotherian/deploy@81db48c]: Second attempt at 81db48c6db1fdfea81a01796ccbb7cfeccc43052 [21:34:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:35:04] 06Operations, 10Domains, 10Education-Program-Dashboard, 10Traffic: Create short link for outreachdashboard.wmflabs.org - https://phabricator.wikimedia.org/T146332#3062640 (10Dzahn) @Ragesoss Ok, just seems like a waste of good work to remove it entirely and start from scratch. Maybe we can start by updatin... [21:36:53] (03CR) 10Dzahn: [C: 032] Phabricator: Fix phd not starting up after reboot if it was previously stopped [puppet] - 10https://gerrit.wikimedia.org/r/340242 (https://phabricator.wikimedia.org/T158434) (owner: 10Paladox) [21:37:07] (03PS6) 10Dzahn: Phabricator: Fix phd not starting up after reboot if it was previously stopped [puppet] - 10https://gerrit.wikimedia.org/r/340242 (https://phabricator.wikimedia.org/T158434) (owner: 10Paladox) [21:38:12] 06Operations, 10Domains, 10Education-Program-Dashboard, 10Traffic: Create short link for outreachdashboard.wmflabs.org - https://phabricator.wikimedia.org/T146332#3062646 (10Ragesoss) @Dzahn sounds good. I'll start pulling together all the dependency updates I notice. I added the Gerrit one, and I guess th... [21:39:27] (03PS2) 10Dzahn: Remove last settings for mc2001->mc2016 from puppet [puppet] - 10https://gerrit.wikimedia.org/r/339611 (https://phabricator.wikimedia.org/T157675) (owner: 10Elukey) [21:40:50] !log maxsem@tin Finished deploy [kartotherian/deploy@81db48c]: Second attempt at 81db48c6db1fdfea81a01796ccbb7cfeccc43052 (duration: 06m 39s) [21:40:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:44:34] !log Updating RESTBase mobileapps tables (all remaining) to use time-windowed compaction [21:44:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:50:20] (03PS7) 10Krinkle: [WIP] mediawiki: Add cache-warmup to maintenance [puppet] - 10https://gerrit.wikimedia.org/r/339802 (https://phabricator.wikimedia.org/T156922) [22:07:54] (03PS8) 10Krinkle: [WIP] mediawiki: Add cache-warmup to maintenance [puppet] - 10https://gerrit.wikimedia.org/r/339802 (https://phabricator.wikimedia.org/T156922) [22:08:42] (03PS9) 10Krinkle: [WIP] mediawiki: Add cache-warmup to maintenance [puppet] - 10https://gerrit.wikimedia.org/r/339802 (https://phabricator.wikimedia.org/T156922) [22:09:44] PROBLEM - puppet last run on elastic1024 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [22:09:55] (03PS1) 10Jcrespo: Revert "mariadb: Depool db1056 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/340416 [22:17:58] (03PS1) 10Jcrespo: Revert "mariadb: Depool db1026 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/340417 [22:18:43] (03CR) 10Dzahn: [C: 032] Remove last settings for mc2001->mc2016 from puppet [puppet] - 10https://gerrit.wikimedia.org/r/339611 (https://phabricator.wikimedia.org/T157675) (owner: 10Elukey) [22:21:44] RECOVERY - MariaDB Slave Lag: s2 on db1047 is OK: OK slave_sql_lag Replication lag: 0.00 seconds [22:21:47] (03PS3) 10Dzahn: DNS/Decom Remove production DNS for mc2001-mc2016 [dns] - 10https://gerrit.wikimedia.org/r/340195 (owner: 10Papaul) [22:22:18] (03CR) 10Dzahn: [C: 032] DNS/Decom Remove production DNS for mc2001-mc2016 [dns] - 10https://gerrit.wikimedia.org/r/340195 (owner: 10Papaul) [22:23:03] (03PS10) 10Krinkle: [WIP] mediawiki: Add cache-warmup to maintenance [puppet] - 10https://gerrit.wikimedia.org/r/339802 (https://phabricator.wikimedia.org/T156922) [22:26:56] !log (T157675) - revoke puppet certs, deactivate nodes, rm from icinga. [puppetmaster1001:~] $ for mcnode in $(seq 2001 2016); do puppet node clean mc${mcnode}.codfw.wmnet && puppet node deactivate mc${mcnode}.codfw.wmnet ; done [22:27:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:27:01] T157675: Reclaim/Decommission old codfw mc2001->mc2016 hosts - https://phabricator.wikimedia.org/T157675 [22:29:48] !log (T157675) - delete salt keys - [neodymium:~] $ for mcnode in $(seq 2001 2016); do sudo salt-key -d mc${mcnode}.codfw.wmnet; done [22:29:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:31:25] (03PS2) 10Hashar: interface: IPAddr.new() requires an address family [puppet] - 10https://gerrit.wikimedia.org/r/336840 [22:31:27] (03PS1) 10Hashar: interface: add rspec boilerplate [puppet] - 10https://gerrit.wikimedia.org/r/340420 [22:32:25] (03CR) 10Hashar: "That is just some basic rspec boilerplate. The first use case is in child change https://gerrit.wikimedia.org/r/#/c/336840/ which test so" [puppet] - 10https://gerrit.wikimedia.org/r/340420 (owner: 10Hashar) [22:32:32] (03CR) 10Paladox: [C: 031] "Bump" [puppet] - 10https://gerrit.wikimedia.org/r/340242 (https://phabricator.wikimedia.org/T158434) (owner: 10Paladox) [22:32:48] (03PS7) 10Paladox: Phabricator: Fix phd not starting up after reboot if it was previously stopped [puppet] - 10https://gerrit.wikimedia.org/r/340242 (https://phabricator.wikimedia.org/T158434) [22:33:09] (03PS8) 10Dzahn: Phabricator: Fix phd not starting up after reboot if it was previously stopped [puppet] - 10https://gerrit.wikimedia.org/r/340242 (https://phabricator.wikimedia.org/T158434) (owner: 10Paladox) [22:33:34] (03CR) 10jerkins-bot: [V: 04-1] interface: IPAddr.new() requires an address family [puppet] - 10https://gerrit.wikimedia.org/r/336840 (owner: 10Hashar) [22:34:12] (03CR) 10Dzahn: [C: 032] "beats the double (re)base https://en.wikipedia.org/wiki/Bass_drum#Double_bass_drum" [puppet] - 10https://gerrit.wikimedia.org/r/340242 (https://phabricator.wikimedia.org/T158434) (owner: 10Paladox) [22:34:31] (03CR) 10Hashar: "rspec says the catalog manage to compile under ruby 2.4. I haven't tested with a previous ruby version though :/" (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/336840 (owner: 10Hashar) [22:34:36] (03PS1) 10Awight: Deprecate DonationInterface i18n messages [mediawiki-config] - 10https://gerrit.wikimedia.org/r/340421 (https://phabricator.wikimedia.org/T159098) [22:35:46] (03CR) 10Paladox: [C: 031] "lol :)" [puppet] - 10https://gerrit.wikimedia.org/r/340242 (https://phabricator.wikimedia.org/T158434) (owner: 10Paladox) [22:35:48] 06Operations, 10ops-codfw, 10hardware-requests, 13Patch-For-Review, 15User-Elukey: Reclaim/Decommission old codfw mc2001->mc2016 hosts - https://phabricator.wikimedia.org/T157675#3062952 (10Dzahn) The hosts are gone from icinga now after the commands above and running puppet on einsteinium. [22:37:00] (03PS4) 10Hashar: wmflib: os_version now fail when lsb vars are missing [puppet] - 10https://gerrit.wikimedia.org/r/308882 [22:37:56] (03CR) 10Hashar: "Indeed the copy pasting was rather lame. PS4 now reuses lookupvar() results :}" [puppet] - 10https://gerrit.wikimedia.org/r/308882 (owner: 10Hashar) [22:38:50] RECOVERY - puppet last run on elastic1024 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [22:38:53] 06Operations, 10Domains, 10Education-Program-Dashboard, 10Traffic: Create short link for outreachdashboard.wmflabs.org - https://phabricator.wikimedia.org/T146332#3062964 (10Dzahn) @Ragesoss Nice. Yea, those tasks on that workboard column sound about right to me. I'll comment on T159274 for getting the Ger... [22:40:48] (03CR) 10Paladox: [C: 031] "Passes tests and running on phab-01 :)" [puppet] - 10https://gerrit.wikimedia.org/r/340242 (https://phabricator.wikimedia.org/T158434) (owner: 10Paladox) [22:41:21] (03CR) 10Dzahn: "thanks :) submitted!" [puppet] - 10https://gerrit.wikimedia.org/r/340242 (https://phabricator.wikimedia.org/T158434) (owner: 10Paladox) [22:41:38] (03CR) 10Paladox: "Thanks and your welcome :)" [puppet] - 10https://gerrit.wikimedia.org/r/340242 (https://phabricator.wikimedia.org/T158434) (owner: 10Paladox) [22:42:10] (03PS11) 10Paladox: Phabricator: Migrate to base::service_unit for ssh-phab [puppet] - 10https://gerrit.wikimedia.org/r/339763 (https://phabricator.wikimedia.org/T137928) [22:42:27] (03PS9) 10Paladox: Phabricator: Migrate to base::service_unit for phd [puppet] - 10https://gerrit.wikimedia.org/r/340158 (https://phabricator.wikimedia.org/T137928) [22:48:51] (03CR) 10Dzahn: "compiler job started, result will appear later at" [puppet] - 10https://gerrit.wikimedia.org/r/334301 (https://phabricator.wikimedia.org/T93645) (owner: 10Juniorsys) [22:51:37] (03Draft1) 10Paladox: Phabricator: Start and stop phd by force [puppet] - 10https://gerrit.wikimedia.org/r/340424 [22:51:40] (03PS2) 10Paladox: Phabricator: Start and stop phd by force [puppet] - 10https://gerrit.wikimedia.org/r/340424 [22:53:03] RainbowSprinkles: was the Gerrit -> Github repo sync fully automatic or was there some kind of opt-in to activate it for each repo? [22:53:58] You have to do it for per repo [22:54:02] mutante ^^ [22:54:36] paladox: do you know how? on https://phabricator.wikimedia.org/T159274#3063020 [22:54:51] mutante: You need to create the empty repo at github [22:54:53] mutante we could use phab [22:54:58] And then when gerrit next does something, it all gets pushed [22:55:14] it's better as it is easyer for everyone to edit. Not sure if you have permission to do it on the repo. [22:55:24] thank you both [22:55:27] I also have never done it through gerrit (mirroring a repo) :) [22:55:52] Oh, so it is automatic :) [22:58:34] what about the other way, github -> gerrit? [22:58:43] oh, you are here :) [22:58:56] that's even better [22:59:14] github to gerrit is a nogo, afai [22:59:23] at least, not automatically [22:59:26] is that not what wiki-ai does? [22:59:27] Reedy it's possible [22:59:46] eg, https://github.com/wiki-ai/ores [23:00:22] ah, so maybe they push to gerrit manually whenever it's time to deploy new code? [23:01:03] it's problematic beacuse it's against L3 [23:01:07] afaict [23:01:24] " Do not install software via pip, gem, pulling down a git repo from an external source, etc. If software that you need is not available, please work with Ops to get the package added to our repository." [23:01:45] "ianal" [23:02:02] jouncebot: next [23:02:02] In 0 hour(s) and 57 minute(s): Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170301T0000) [23:03:14] mutante: Gerrit -> Github is sorta automatic (as stated: just gotta make the empty repo first) [23:03:26] If wiki-ai is being pulled into gerrit, someone's doing that by hand [23:04:03] RainbowSprinkles: gotcha, and thank you [23:04:24] Also: iirc we've got something in puppet that deploys straight from wiki-ai, and that's scary to me [23:04:31] (same thing with deploying from github directly, generally) [23:04:38] okay. I'll ask them how they are managing the flow of code, because I'm pretty sure they are using github as the main place for CR and issue management. [23:04:57] (which is definitely what I plan to continue doing for the dashboard codebase) [23:05:04] RainbowSprinkles: Can I make your day bad by asking for a real Phab repo to be deleted and re-created as a gerrit clone? [23:05:25] James_F: Ugh, must you? [23:05:37] RainbowSprinkles: Security (Reedy) insisted. [23:05:49] Well [23:05:51] ragesoss: I'm not generally opposed to using Github for development / issue tracking, but the code *must* be pulled back into Gerrit prior to deploying [23:05:54] We can't deploy from phab... [23:05:58] And if you want it deploying :P [23:06:06] Deploying from Github directly triggers my rageface [23:06:16] Reedy: Why can't we deploy from Phab? [23:06:24] The repo's already moved from https://phabricator.wikimedia.org/source/3d/manage/ to https://gerrit.wikimedia.org/r/#/admin/projects/mediawiki/extensions/3D but I can't change it from the Web interface to be a clone. [23:06:28] RainbowSprinkles: Does make-wmf-branch work with it? [23:06:35] No, of course not! [23:06:45] There's your answer [23:06:46] :P [23:07:07] James_F: Just gotta add the repo to track an upstream [23:07:29] ragesoss: what Rainbow said :) we can make it easier by just telling puppet to git clone from gerrit. we don't have to do use scap or trebuchet [23:07:37] RainbowSprinkles: I thought so too but I couldn't see how. [23:08:02] Add new URI -> I/O type set to observe [23:08:07] James_F: you want a Phab repo to mirror one from elsewhere? That's easy [23:08:09] https://phabricator.wikimedia.org/source/3d/manage/uris/ [23:08:17] riOh, cool. [23:08:41] TabbyCat: Indeed, I've done that bit before. [23:08:50] mutante: s/or trecbuchet// [23:08:57] *NOTHING* should start using that if it doesn't already [23:10:43] RainbowSprinkles: :) yes. all it needs is git pull on puppet run, works fine for other small things [23:12:20] That's what I'm thinking of doing for some of the leftover trebuchet crud [23:12:39] RainbowSprinkles: Yay, all fixed, thank you. [23:12:39] That is deployed so infrequently that they're not worth using scap over [23:12:43] James_F: yw [23:12:51] i do it for annualreport and it's just fine [23:12:53] Now to get CI working on that repo. [23:12:58] RainbowSprinkles im wandering could scap support clonning over annon http? Reason i ask is on labs it's hard to do it with an ssh key. [23:13:13] The main reason behind it is for phabricator. [23:15:28] paladox: there should be keys for keyholder in labs/private ? [23:15:31] That's a puppet thing, not scap itself. [23:15:51] But, https should be fine. It's mostly readonly anyway [23:15:51] Yep, but dosent work for me [23:15:52] allways fails [23:15:53] paladox: in production the private keys are in private repo, so like with other things , add "fake" ones in labs/private repo [23:16:13] Oh, but doint you have to do something like arm the key? [23:16:17] I think that bit fails [23:16:28] yes, but only once after rebooting a machine [23:16:34] and then you have to type the key passphrases [23:16:48] i have a ticket to change it so it's only one passphrase and not multiple [23:17:11] tell me what exactly fails next time you try [23:17:20] Oh, to deploy to target hosts? No, that requires ssh [23:17:27] That can't be done over https [23:17:46] (the initial *cloning* and *fetching* can/should be able to use https, if we configured it as such) [23:18:01] so basically, the way to handle is probably to continue using github for managing development, and then manually pulling new code over to the gerrit repo before new deployments? [23:18:19] we also had an issue with it because [23:18:21] "The latest version of openssh-client no longer stores or outputs the key filenames along with the ssh public-key fingerprint as part of ssh-add -l." [23:18:24] ragesoss: Yes, absolutely. It's been on my todo list to audit the places we've been violating this [23:18:25] long story [23:18:43] paladox: so it might even be related to ssh client version [23:18:49] (03Draft1) 10Paladox: Gerrit: Report repo in comment on merged patches too [puppet] - 10https://gerrit.wikimedia.org/r/340435 [23:18:52] (03PS2) 10Paladox: Gerrit: Report repo in comment on merged patches too [puppet] - 10https://gerrit.wikimedia.org/r/340435 [23:18:54] (phabricator, or gerrit. something we control. just not freaking github :)) [23:18:55] Oh [23:19:30] paladox: we can fix it the right way without a special case for labs i think :) [23:19:32] RainbowSprinkles yeh git cloning over http :) [23:19:52] as currently it failed puppet because it couldent clone the phabricator/deployment repo [23:19:59] so we had to manually clone it for puppet to start working [23:20:02] Oh [23:20:05] :) [23:21:11] We already do that [23:21:29] The deploy user doesn't have an ssh key in Gerrit [23:21:33] Same in production [23:22:26] jouncebot: next [23:22:26] In 0 hour(s) and 37 minute(s): Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170301T0000) [23:22:30] If it can't clone that's not an ssh problem [23:22:40] (03PS3) 10Paladox: Gerrit: Report repo in comment on merged patches too [puppet] - 10https://gerrit.wikimedia.org/r/340435 [23:23:10] (03PS4) 10Paladox: Gerrit: Report repo in comment on merged patches too [puppet] - 10https://gerrit.wikimedia.org/r/340435 (https://phabricator.wikimedia.org/T159202) [23:23:18] Oh [23:23:41] paladox: ^ i noticed that but thought to myself "i don't need to mention it, i bet there is a patch soon" :) [23:23:52] lol [23:23:54] :) [23:24:47] We will need to do it for restored and abandoned but that can be done later as i have no idea what the bot says for those two comments :). I will wait to find out how it is formated first for abandoned and restored. [23:24:54] the merged one can be merged :) [23:25:00] did not expect the changes in "actions.config" heh [23:25:12] Yeh, we need to define a custom message [23:25:51] RainbowSprinkles this is https://github.com/wikimedia/puppet/blob/9787ffe723ec8139b1d94bca80cbf5b2a1b2566c/modules/phabricator/manifests/init.pp#L153 the problem [23:26:07] but why was it already able to detect "on merge" if there was no action for that? [23:26:25] Because it was using the pre defined its comments [23:26:29] its = its-base comment [23:26:56] notice action = add-standard-comment [23:26:59] paladox: How is that the problem? [23:27:17] Because it is trying to clone scap but failing because it carn't deploy it [23:27:24] That's not cloning over ssh [23:28:10] It's other ssh stuff [23:28:15] Yes, which is required [23:28:20] But it's not cloning over ssh [23:28:22] I promise you [23:28:27] oh [23:28:38] How can i stop the other ssh stuff? [23:28:40] I'm going to run a script that can load a bit the servers. Which maxlag value do you recomend to me? [23:29:01] paladox: You *cant* [23:29:06] oh [23:29:06] scap requires ssh [23:29:10] oh [23:29:10] How else would it talk to targets? [23:29:10] PROBLEM - puppet last run on mw1172 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:29:15] demon@tin /srv/deployment/phabricator/deployment (wmf/stable)$ git remote -v [23:29:15] origin https://gerrit.wikimedia.org/r/p/phabricator/deployment (fetch) [23:29:15] origin https://gerrit.wikimedia.org/r/p/phabricator/deployment (push) [23:29:17] See ^ [23:29:22] Doesn't clone over ssh [23:29:26] yep [23:29:27] Could it support local host? [23:29:44] It'll ssh to the local host, so you still need ssh keys [23:29:48] TabbyCat: i think 5 , but only because the manual says it [23:29:56] Oh that's probaly why it failed. [23:30:01] TabbyCat: "Use maxlag=5 (5 seconds). This is an appropriate non-aggressive value, set as default value on Pywikibot. Higher values mean more aggressive behaviour, lower values are nicer." [23:30:24] paladox: You need to figure out/fix your ssh keys problem, we're not adding kludges to scap to avoid the entire way it works :) [23:30:30] Will use maxlag 5 then [23:30:33] (even if the target is also the local host) [23:30:34] Ok [23:33:29] * paladox wonders should we implement in its-phabricator a way for gerritbot to un subscribe it's self when it's merged [23:33:45] Does it matter? [23:33:51] It's not sending e-mails anywhere [23:33:52] Nope [23:33:54] :) [23:36:30] RECOVERY - nova instance creation test on labnet1001 is OK: PROCS OK: 1 process with command name python, args nova-fullstack [23:37:35] RainbowSprinkles i've implemented the new maniphest.edit conduit api in its-phabricator so we can now do alot more then we could :) [23:37:39] (03PS11) 10Krinkle: [WIP] mediawiki: Add cache-warmup to maintenance [puppet] - 10https://gerrit.wikimedia.org/r/339802 (https://phabricator.wikimedia.org/T156922) [23:38:09] https://gerrit-review.googlesource.com/#/c/98576/ [23:38:10] PROBLEM - puppet last run on maps-test2004 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 8 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[nodejs],Service[kartotherian] [23:38:31] PROBLEM - Check systemd state on maps-test2004 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [23:39:30] PROBLEM - nova instance creation test on labnet1001 is CRITICAL: PROCS CRITICAL: 0 processes with command name python, args nova-fullstack [23:40:55] RainbowSprinkles ok there's a majour bug [23:40:56] http://gerrit-new.wmflabs.org/#/c/2/ [23:40:58] see that [23:41:00] bonjour [23:41:12] Mutante this ones for you hallo [23:41:22] this one is for usa hello [23:41:35] Say hello [23:42:58] paladox: i see links to "Germany" and Wikipedia, but why ?:) [23:43:04] Lol [23:43:07] it's a bug [23:43:13] which i wont be able to track [23:43:25] as non of the comment messages say add Germany wikipedia link [23:43:26] lol [23:43:36] haha, what [23:44:26] i assume you are telling upstream then :) [23:44:31] Yes [23:44:39] this is my message [23:44:39] https://groups.google.com/forum/#!topic/repo-discuss/kAZ4XwrMZKA [23:44:45] a huge bug in gerrit master [23:44:46] lol [23:44:48] paladox: if you find it in code you can use "git blame" to find out who and which commit added that [23:45:00] I carn't find the word germany in the code [23:45:16] because it's Germany and capitalized? [23:45:24] oh i found it [23:45:24] https://github.com/GerritCodeReview/plugins_cookbook-plugin/search?utf8=✓&q=Germany [23:45:33] lol they fixed cookbook plugin [23:45:35] that cookbook plugin again? what the hack [23:47:19] We don't need that anyway ;-) [23:47:21] mutante: set -maxlag:5 in the grid job -- it's sleeping between 6 and 10 seconds between each edit -- hope that's correct [23:47:52] * paladox uninstalls it [23:48:41] I've submitted this upstream to remove it [23:48:42] https://gerrit-review.googlesource.com/#/c/98870/ [23:48:48] RainbowSprinkles ^^ [23:48:50] TabbyCat: sorry, i don't know [23:49:26] :) np - sorry [23:51:10] paladox: Oh, I don't care what upstream does [23:51:14] I'm saying we don't need/use it [23:51:18] Keep it disabled :) [23:51:23] Yep, but it seems to work now. [23:51:31] Whereas before it seemed to not work [23:52:10] * RainbowSprinkles shrugs [23:52:19] I don't think we need to change upstream here [23:52:24] Oh [23:53:11] Anyway, it's beer time. Adios folks [23:53:17] lol [23:54:00] It was apprently caused by https://gerrit-review.googlesource.com/#/c/98753/ [23:55:57] it's adiós by the way :P [23:56:25] TabbyCat: Yes, but I was too lazy to add the diacritic [23:56:26] :p [23:57:00] RainbowSprinkles: heh, well, I'll still be nice at you :) [23:57:10] RECOVERY - puppet last run on mw1172 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [23:58:48] (03PS12) 10Krinkle: [WIP] mediawiki: Add cache-warmup to maintenance [puppet] - 10https://gerrit.wikimedia.org/r/339802 (https://phabricator.wikimedia.org/T156922) [23:59:11] (03CR) 10Krinkle: "3min for the clone run. 30s for the spread run. Details:" [puppet] - 10https://gerrit.wikimedia.org/r/339802 (https://phabricator.wikimedia.org/T156922) (owner: 10Krinkle)