[00:01:13] Platonides: I updated the deployments table: https://wikitech.wikimedia.org/w/index.php?title=Deployments&type=revision&diff=818061&oldid=818049 [00:01:34] Platonides: so next time you when you've something to deploy, you prefix with the target wmf branch [00:01:44] and it works outside mw1099 [00:01:45] add a small description and the link to Gerrit [00:01:48] thanks Dereckson [00:01:54] You're welcome. [00:09:39] RECOVERY - Host mr1-eqiad.oob is UP: PING OK - Packet loss = 0%, RTA = 1.64 ms [00:18:29] (03PS1) 10BryanDavis: logstash: new input for msgpack over UDP [puppet] - 10https://gerrit.wikimedia.org/r/306081 (https://phabricator.wikimedia.org/T143172) [00:27:41] night [00:30:23] good night [00:54:37] PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 212, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-4/2/0: down - Core: cr1-eqord:xe-1/0/0 (Telia, IC-314533, 29ms) {#3658} [10Gbps wave]BR [00:58:37] PROBLEM - Router interfaces on cr1-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 37, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-1/1/0: down - Core: cr2-eqiad:xe-4/2/0 (Telia, IC-314533, 24ms) {#11371} [10Gbps wave]BR [01:07:49] PROBLEM - Host mr1-eqiad.oob is DOWN: PING CRITICAL - Packet loss = 100% [01:11:17] RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 214, down: 0, dormant: 0, excluded: 0, unused: 0 [01:13:08] RECOVERY - Router interfaces on cr1-eqord is OK: OK: host 208.80.154.198, interfaces up: 39, down: 0, dormant: 0, excluded: 0, unused: 0 [01:20:18] RECOVERY - Host mr1-eqiad.oob is UP: PING OK - Packet loss = 0%, RTA = 3.31 ms [01:29:08] PROBLEM - Host mr1-eqiad.oob is DOWN: PING CRITICAL - Packet loss = 100% [01:48:08] RECOVERY - Host mr1-eqiad.oob is UP: PING OK - Packet loss = 0%, RTA = 6.88 ms [01:50:29] PROBLEM - Postgres Replication Lag on maps-test2002 is CRITICAL: CRITICAL - Rep Delay is: 1817.760422 Seconds [01:52:37] RECOVERY - Postgres Replication Lag on maps-test2002 is OK: OK - Rep Delay is: 112.665011 Seconds [01:53:48] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [50.0] [01:55:49] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [02:09:08] PROBLEM - MD RAID on install2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:11:07] RECOVERY - MD RAID on install2001 is OK: OK: Active: 6, Working: 6, Failed: 0, Spare: 0 [02:19:28] PROBLEM - MegaRAID on install2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:21:28] RECOVERY - MegaRAID on install2001 is OK: OK: no disks configured for RAID [02:22:23] !log mwdeploy@tin scap sync-l10n completed (1.28.0-wmf.15) (duration: 10m 16s) [02:22:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:28:07] !log l10nupdate@tin ResourceLoader cache refresh completed at Tue Aug 23 02:28:07 UTC 2016 (duration 5m 44s) [02:28:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [03:04:48] PROBLEM - MD RAID on install2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:06:48] PROBLEM - MegaRAID on install2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:08:47] RECOVERY - MegaRAID on install2001 is OK: OK: no disks configured for RAID [03:10:47] RECOVERY - MD RAID on install2001 is OK: OK: Active: 6, Working: 6, Failed: 0, Spare: 0 [04:02:09] 06Operations, 10Ops-Access-Requests: Access to people.wikimedia.org for Volker_E - https://phabricator.wikimedia.org/T143465#2574277 (10Volker_E) @Legoktm I've been reaching out on #wikimedia-operations to understand what's the common/recommended way for my (quite urgent) needs. The idea of people.wikimedia.or... [04:13:08] PROBLEM - MD RAID on install2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:19:07] RECOVERY - MD RAID on install2001 is OK: OK: Active: 6, Working: 6, Failed: 0, Spare: 0 [04:22:47] 06Operations, 06Discovery, 10Traffic, 10Wikidata, and 2 others: Tune WDQS caching headers - https://phabricator.wikimedia.org/T137238#2574309 (10Smalyshev) @Gehel is anything left to do in this ticket? [04:41:11] 06Operations, 10Ops-Access-Requests: Access to people.wikimedia.org for Volker_E - https://phabricator.wikimedia.org/T143465#2574323 (10Legoktm) If it's urgent, requesting a shell account that requiring waiting for ops approval seems like the wrong approach to me :) You can showcase your work anywhere, I gues... [04:51:18] PROBLEM - MD RAID on install2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:03:28] RECOVERY - MD RAID on install2001 is OK: OK: Active: 6, Working: 6, Failed: 0, Spare: 0 [05:23:28] PROBLEM - MD RAID on install2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:25:28] RECOVERY - MD RAID on install2001 is OK: OK: Active: 6, Working: 6, Failed: 0, Spare: 0 [05:39:56] Toollabs down? [05:40:44] Getting ERR_EMPTY_RESPONSE and ERR_CONNECTION_RESET and Error 500 on http://tools.wmflabs.org/wikidata-game/ and https://tools.wmflabs.org/magnustools/random_image_commons_subcat.php?category=Files+uploaded+by+Josve05a+%28cleanup%29&d=3 [05:43:19] PROBLEM - tools homepage -admin tool- on tools.wmflabs.org is CRITICAL: HTTP CRITICAL - No data received from host [05:44:02] (03PS2) 10Dzahn: DNS: Add mgmt and production DNS for wqds200[12] Bug:T142864 [dns] - 10https://gerrit.wikimedia.org/r/306056 (https://phabricator.wikimedia.org/T142864) (owner: 10Papaul) [05:45:39] PROBLEM - MD RAID on install2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:49:28] RECOVERY - tools homepage -admin tool- on tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 3670 bytes in 0.039 second response time [05:50:49] Josve05a: [22:48:37] !log tools restarted nginx on tools-proxy-01, was out of connection slots [05:50:59] ah [05:52:32] !log install2001 - "MD RAID" and "MegaRAID" icinga checks and both fail? new/test? install1001 doesn't have these checks - disabled notifications [05:52:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [05:59:03] akosiaris: few more Apertium packages are ready (3), and which will unblock more (3). Please review when you've time. [05:59:14] akosiaris: 2 are blocked on build timeout. [06:00:48] (03PS3) 10KartikMistry: WIP: Configurable mode_path for apertium [puppet] - 10https://gerrit.wikimedia.org/r/297350 (https://phabricator.wikimedia.org/T139330) [06:06:07] (03PS4) 10KartikMistry: apertium-eus: Rebuild for Jessie and other fixes [debs/contenttranslation/apertium-eus] - 10https://gerrit.wikimedia.org/r/294673 (https://phabricator.wikimedia.org/T107306) [06:06:21] (03CR) 10jenkins-bot: [V: 04-1] apertium-eus: Rebuild for Jessie and other fixes [debs/contenttranslation/apertium-eus] - 10https://gerrit.wikimedia.org/r/294673 (https://phabricator.wikimedia.org/T107306) (owner: 10KartikMistry) [06:08:21] (03CR) 10Dzahn: "recheck" [dns] - 10https://gerrit.wikimedia.org/r/306056 (https://phabricator.wikimedia.org/T142864) (owner: 10Papaul) [06:08:49] PROBLEM - Router interfaces on cr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.196, interfaces up: 209, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-4/2/2: down - Core: cr2-knams:xe-1/1/0 (GTT, 00341724) {#3466} [10Gbps MPLS]BR [06:10:49] RECOVERY - Router interfaces on cr1-eqiad is OK: OK: host 208.80.154.196, interfaces up: 211, down: 0, dormant: 0, excluded: 0, unused: 0 [06:11:22] (03PS5) 10KartikMistry: apertium-eus: Rebuild for Jessie and other fixes [debs/contenttranslation/apertium-eus] - 10https://gerrit.wikimedia.org/r/294673 (https://phabricator.wikimedia.org/T107306) [06:14:38] akosiaris: one more jenkins green! [06:16:18] PROBLEM - MegaRAID on install2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:18:17] RECOVERY - MegaRAID on install2001 is OK: OK: no disks configured for RAID [06:26:21] (03PS4) 10KartikMistry: apertium-urd-hin: Rebuild for Jessie and cleanup [debs/contenttranslation/apertium-urd-hin] - 10https://gerrit.wikimedia.org/r/296368 (https://phabricator.wikimedia.org/T107306) [06:27:00] (03CR) 10jenkins-bot: [V: 04-1] apertium-urd-hin: Rebuild for Jessie and cleanup [debs/contenttranslation/apertium-urd-hin] - 10https://gerrit.wikimedia.org/r/296368 (https://phabricator.wikimedia.org/T107306) (owner: 10KartikMistry) [06:28:23] (03CR) 10KartikMistry: "apertium-hin need to upload to WMF repo for dependency." [debs/contenttranslation/apertium-urd-hin] - 10https://gerrit.wikimedia.org/r/296368 (https://phabricator.wikimedia.org/T107306) (owner: 10KartikMistry) [06:32:36] (03PS4) 10KartikMistry: apertium-en-es: Rebuilt for Jessie [debs/contenttranslation/apertium-en-es] - 10https://gerrit.wikimedia.org/r/294314 (https://phabricator.wikimedia.org/T107306) [06:35:10] (03CR) 10jenkins-bot: [V: 04-1] apertium-en-es: Rebuilt for Jessie [debs/contenttranslation/apertium-en-es] - 10https://gerrit.wikimedia.org/r/294314 (https://phabricator.wikimedia.org/T107306) (owner: 10KartikMistry) [06:41:17] PROBLEM - Router interfaces on cr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.196, interfaces up: 209, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-4/2/2: down - Core: cr2-knams:xe-1/1/0 (GTT, 00341724) {#3466} [10Gbps MPLS]BR [06:45:17] RECOVERY - Router interfaces on cr1-eqiad is OK: OK: host 208.80.154.196, interfaces up: 211, down: 0, dormant: 0, excluded: 0, unused: 0 [06:51:42] (03PS3) 10Glaisher: Remove English for all groups from $wgTranslateBlacklist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306015 (https://phabricator.wikimedia.org/T124013) [06:53:46] 06Operations, 10ops-eqiad, 06DC-Ops, 06Labs, 10Labs-Infrastructure: Locate and assign some MD1200 shelves for proper testing of labstore1002 - https://phabricator.wikimedia.org/T101741#2574544 (10yuvipanda) Did this happen? Does this still need to happen with the new labstore stuff that @chasemp / @madhu... [06:54:34] 06Operations, 10Beta-Cluster-Infrastructure: beta: Get SSL certificates for *.{projects}.beta.wmflabs.org - https://phabricator.wikimedia.org/T50501#2574547 (10yuvipanda) [07:03:12] (03PS4) 10Nemo bis: Remove English for all groups from $wgTranslateBlacklist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306015 (https://phabricator.wikimedia.org/T124013) (owner: 10Glaisher) [07:03:20] (03CR) 10Jcrespo: "I suppose this can only be deployed during maintenance, right?" [puppet] - 10https://gerrit.wikimedia.org/r/305668 (https://phabricator.wikimedia.org/T138778) (owner: 10Dduvall) [07:07:07] PROBLEM - MegaRAID on install2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:08:32] (03CR) 10Nemo bis: [C: 031] Remove English for all groups from $wgTranslateBlacklist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306015 (https://phabricator.wikimedia.org/T124013) (owner: 10Glaisher) [07:09:52] (03CR) 10Nemo bis: "I didn't mention T37489 in the commit message because it doesn't seem to be your concern, but we'd need to do this too once T69223 is fixe" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306015 (https://phabricator.wikimedia.org/T124013) (owner: 10Glaisher) [07:09:54] (03CR) 10Nikerabbit: [C: 031] Remove English for all groups from $wgTranslateBlacklist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306015 (https://phabricator.wikimedia.org/T124013) (owner: 10Glaisher) [07:11:09] RECOVERY - MegaRAID on install2001 is OK: OK: no disks configured for RAID [07:14:59] (03CR) 10KartikMistry: "/tmp/hudson1687932244205747632.sh: line 5: /usr/bin/lintian-junit-report: No such file or directory - sounds like a configuration issue. @" [debs/contenttranslation/apertium-en-es] - 10https://gerrit.wikimedia.org/r/294314 (https://phabricator.wikimedia.org/T107306) (owner: 10KartikMistry) [07:16:45] (03PS1) 10Giuseppe Lavagetto: Temporarily remove conf1003 from the client etcd list. [dns] - 10https://gerrit.wikimedia.org/r/306157 [07:29:29] !log installing botan security updates on trusty systems [07:29:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [07:53:39] (03CR) 10Giuseppe Lavagetto: [C: 032] Beta Scap: dsh groups in hieradata [puppet] - 10https://gerrit.wikimedia.org/r/306070 (owner: 10Thcipriani) [07:58:36] (03PS5) 10Giuseppe Lavagetto: scap: use conftool data to populate dsh groups [puppet] - 10https://gerrit.wikimedia.org/r/305996 (https://phabricator.wikimedia.org/T132529) [08:00:45] (03CR) 10Giuseppe Lavagetto: [C: 032] scap: use conftool data to populate dsh groups (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/305996 (https://phabricator.wikimedia.org/T132529) (owner: 10Giuseppe Lavagetto) [08:01:32] (03CR) 10Giuseppe Lavagetto: [V: 032] scap: use conftool data to populate dsh groups [puppet] - 10https://gerrit.wikimedia.org/r/305996 (https://phabricator.wikimedia.org/T132529) (owner: 10Giuseppe Lavagetto) [08:03:49] (03PS1) 10Giuseppe Lavagetto: scap::dsh: fix dsh template [puppet] - 10https://gerrit.wikimedia.org/r/306158 [08:04:18] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] scap::dsh: fix dsh template [puppet] - 10https://gerrit.wikimedia.org/r/306158 (owner: 10Giuseppe Lavagetto) [08:06:47] PROBLEM - puppet last run on mira is CRITICAL: CRITICAL: puppet fail [08:08:00] yuvipanda: FYI im on holiday for 9 days now, so maybe do the grafana puppet stuff when i get back? [08:08:48] RECOVERY - puppet last run on mira is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [08:16:08] PROBLEM - MegaRAID on install2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:20:58] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [50.0] [08:22:08] RECOVERY - MegaRAID on install2001 is OK: OK: no disks configured for RAID [08:22:58] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [08:24:53] (03PS1) 10Giuseppe Lavagetto: conftool: add all mw logical cluster to use for scap targets [puppet] - 10https://gerrit.wikimedia.org/r/306160 [08:26:08] (03PS2) 10Giuseppe Lavagetto: conftool: add all mw logical cluster to use for scap targets [puppet] - 10https://gerrit.wikimedia.org/r/306160 [08:26:57] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] conftool: add all mw logical cluster to use for scap targets [puppet] - 10https://gerrit.wikimedia.org/r/306160 (owner: 10Giuseppe Lavagetto) [08:27:06] (03PS1) 10Nikerabbit: Update outdated comment for Wikibase [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306161 [08:31:33] 06Operations, 10ops-codfw: install2001 hardware troubles - https://phabricator.wikimedia.org/T137647#2374163 (10Volans) The CPU issue was back again, I've `rmmod acpi_pad` and the CPU usage is back to normal, but surely need some more investigation. [08:32:56] (03PS1) 10Nikerabbit: Remove no longer relevant $wgTranslateTasks overrides [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306162 [08:44:40] (03PS1) 10Gehel: WDQS caching headers [puppet] - 10https://gerrit.wikimedia.org/r/306163 [08:44:59] (03PS2) 10Gehel: WDQS caching headers [puppet] - 10https://gerrit.wikimedia.org/r/306163 (https://phabricator.wikimedia.org/T137238) [08:46:03] 06Operations, 06Discovery, 10Traffic, 10Wikidata, and 2 others: Tune WDQS caching headers - https://phabricator.wikimedia.org/T137238#2574633 (10Gehel) @Smalyshev yes there is: adding some cache-control headers. Change submitted. Thanks for reminding me! [08:51:33] (03CR) 10Glaisher: "Why was optional and suggestions removed?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306162 (owner: 10Nikerabbit) [08:52:50] !log Jenkins had some deadlocks preventing builds from processing. Resolved by disabling/reenabling the Gearman client [08:52:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:52:58] (03CR) 10Nemo bis: Remove no longer relevant $wgTranslateTasks overrides (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306162 (owner: 10Nikerabbit) [09:00:22] !log updated wikitech-static to MW 1.27.1 [09:00:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [09:02:42] (03PS1) 10Giuseppe Lavagetto: scap::dsh: add missing codfw hosts to conftool [puppet] - 10https://gerrit.wikimedia.org/r/306164 [09:04:18] (03PS4) 10Nemo bis: Monthly update of the "slowest" querypages on the English Wikipedia [puppet] - 10https://gerrit.wikimedia.org/r/304696 (https://phabricator.wikimedia.org/T142936) [09:10:53] (03PS1) 10Filippo Giunchedi: add prometheus.svc to eqiad/codfw [dns] - 10https://gerrit.wikimedia.org/r/306165 (https://phabricator.wikimedia.org/T136313) [09:12:58] (03CR) 10Giuseppe Lavagetto: [C: 032] scap::dsh: add missing codfw hosts to conftool [puppet] - 10https://gerrit.wikimedia.org/r/306164 (owner: 10Giuseppe Lavagetto) [09:13:05] (03PS2) 10Filippo Giunchedi: add prometheus.svc to eqiad/codfw [dns] - 10https://gerrit.wikimedia.org/r/306165 (https://phabricator.wikimedia.org/T136313) [09:15:53] (03CR) 10Filippo Giunchedi: [C: 032] add prometheus.svc to eqiad/codfw [dns] - 10https://gerrit.wikimedia.org/r/306165 (https://phabricator.wikimedia.org/T136313) (owner: 10Filippo Giunchedi) [09:19:32] (03CR) 10Nikerabbit: "Glaisher:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306162 (owner: 10Nikerabbit) [09:20:35] (03CR) 10Nikerabbit: Remove no longer relevant $wgTranslateTasks overrides (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306162 (owner: 10Nikerabbit) [09:22:27] 06Operations, 10vm-requests, 13Patch-For-Review, 05Prometheus-metrics-monitoring: eqiad/codfw: 4 VM request for prometheus - https://phabricator.wikimedia.org/T136313#2574740 (10fgiunchedi) 05stalled>03Open SSDs have been installed in ganeti in eqiad, I'm going to create two VMs there for prometheus, t... [09:25:15] 06Operations, 10Monitoring, 05Prometheus-metrics-monitoring: test prometheus server - https://phabricator.wikimedia.org/T126785#2574744 (10fgiunchedi) p:05Low>03Normal [09:34:11] (03PS1) 10Filippo Giunchedi: prometheus: add logging and rotation [puppet] - 10https://gerrit.wikimedia.org/r/306167 (https://phabricator.wikimedia.org/T126785) [09:38:24] 06Operations, 10vm-requests, 13Patch-For-Review, 05Prometheus-metrics-monitoring: eqiad/codfw: 4 VM request for prometheus - https://phabricator.wikimedia.org/T136313#2574754 (10akosiaris) Seems fine. Don't forget to not make them DRBD but rather plain. [09:45:01] (03PS1) 10Filippo Giunchedi: puppet_compiler: noop for prometheus-ganglia generator [puppet] - 10https://gerrit.wikimedia.org/r/306170 [09:46:57] (03CR) 10Filippo Giunchedi: [C: 032] prometheus: add logging and rotation [puppet] - 10https://gerrit.wikimedia.org/r/306167 (https://phabricator.wikimedia.org/T126785) (owner: 10Filippo Giunchedi) [09:47:04] (03PS2) 10Filippo Giunchedi: prometheus: add logging and rotation [puppet] - 10https://gerrit.wikimedia.org/r/306167 (https://phabricator.wikimedia.org/T126785) [09:53:23] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "I like this overall, one minor correction and I think we're GTG." (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/305635 (owner: 10Muehlenhoff) [09:54:34] (03CR) 10Glaisher: [C: 031] "Okay." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306162 (owner: 10Nikerabbit) [10:01:32] 06Operations, 10Dumps-Generation, 07HHVM: Merge https://github.com/facebook/hhvm/commit/9d2be6c30b5b6dadf414692d0e7fbab5f9105b5f into build of next hhvm release - https://phabricator.wikimedia.org/T143648#2574764 (10ArielGlenn) [10:07:01] (03CR) 10Gehel: [C: 04-1] "On the principles, this looks good, but it needs some clean up..." (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/305673 (https://phabricator.wikimedia.org/T143048) (owner: 10MaxSem) [10:07:23] !log upgrading hhvm on codfw mediawiki servers [10:07:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:07:49] (03CR) 10Gehel: WIP: discovery stats module (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/305673 (https://phabricator.wikimedia.org/T143048) (owner: 10MaxSem) [10:24:19] (03CR) 10Giuseppe Lavagetto: [C: 032] puppet_compiler: noop for prometheus-ganglia generator [puppet] - 10https://gerrit.wikimedia.org/r/306170 (owner: 10Filippo Giunchedi) [10:24:39] (03PS2) 10Giuseppe Lavagetto: Beta Scap: dsh groups in hieradata [puppet] - 10https://gerrit.wikimedia.org/r/306070 (owner: 10Thcipriani) [10:25:03] (03CR) 10Giuseppe Lavagetto: [V: 032] Beta Scap: dsh groups in hieradata [puppet] - 10https://gerrit.wikimedia.org/r/306070 (owner: 10Thcipriani) [10:30:31] (03PS5) 10Muehlenhoff: Provide override file for base::service_unit [puppet] - 10https://gerrit.wikimedia.org/r/305635 [10:30:33] (03CR) 10Muehlenhoff: Provide override file for base::service_unit (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/305635 (owner: 10Muehlenhoff) [10:40:24] (03PS1) 10Filippo Giunchedi: mariadb: install node/mysql exporters in eqiad too [puppet] - 10https://gerrit.wikimedia.org/r/306174 (https://phabricator.wikimedia.org/T126757) [10:40:29] PROBLEM - puppet last run on mw2137 is CRITICAL: CRITICAL: Puppet has 1 failures [10:41:01] (03PS2) 10Filippo Giunchedi: puppet_compiler: noop for prometheus-ganglia generator [puppet] - 10https://gerrit.wikimedia.org/r/306170 [10:45:59] (03CR) 10Jcrespo: "Let's wait for the changes on mysqld-exported configuration." [puppet] - 10https://gerrit.wikimedia.org/r/306174 (https://phabricator.wikimedia.org/T126757) (owner: 10Filippo Giunchedi) [10:48:09] (03PS1) 10Filippo Giunchedi: install_server: provision prometheus100[12] [puppet] - 10https://gerrit.wikimedia.org/r/306176 (https://phabricator.wikimedia.org/T136313) [10:50:10] (03CR) 10Filippo Giunchedi: [C: 032] install_server: provision prometheus100[12] [puppet] - 10https://gerrit.wikimedia.org/r/306176 (https://phabricator.wikimedia.org/T136313) (owner: 10Filippo Giunchedi) [10:51:32] (03PS2) 10Filippo Giunchedi: install_server: provision prometheus100[12] [puppet] - 10https://gerrit.wikimedia.org/r/306176 (https://phabricator.wikimedia.org/T136313) [10:53:19] (03PS3) 10Filippo Giunchedi: install_server: provision prometheus100[12] [puppet] - 10https://gerrit.wikimedia.org/r/306176 (https://phabricator.wikimedia.org/T136313) [10:53:24] (03CR) 10Filippo Giunchedi: [V: 032] install_server: provision prometheus100[12] [puppet] - 10https://gerrit.wikimedia.org/r/306176 (https://phabricator.wikimedia.org/T136313) (owner: 10Filippo Giunchedi) [10:57:17] RECOVERY - puppet last run on mw2137 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:59:27] (03PS1) 10Nikerabbit: Remove $wgTranslateEC [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306178 [11:01:47] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [50.0] [11:09:28] (03CR) 10Aude: [C: 031] Update outdated comment for Wikibase [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306161 (owner: 10Nikerabbit) [11:10:00] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [11:16:20] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "A small comment, LGTM otherwise." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/305414 (owner: 10Ppchelko) [11:16:26] <_joe_> mobrovac: ^^ [11:16:40] kk, looking [11:16:51] <_joe_> I am also a bit usnure about combining a change to service::node with the actual config change for changeprop [11:17:34] _joe_: you'd rather split them in two? [11:17:52] <_joe_> mobrovac: well, let me check one thing [11:18:12] <_joe_> I think the change to service::node is small enough though [11:19:08] PROBLEM - puppet last run on mw2244 is CRITICAL: CRITICAL: Puppet has 1 failures [11:19:24] yup [11:19:31] and it's a no-op for everybody else [11:19:37] ok, lemme address your comment [11:20:13] (03CR) 10Muehlenhoff: "labs instances don't use base::firewall by default. Most ferm services work fine in labs, but there's quite a few where that's not support" [puppet] - 10https://gerrit.wikimedia.org/r/305969 (owner: 10Muehlenhoff) [11:22:29] !log Cutting MediaWiki branch 1.28.0-wmf.16 | T141551 [11:22:31] T141551: MW-1.28.0-wmf.16 deployment blockers - https://phabricator.wikimedia.org/T141551 [11:22:31] (03PS7) 10Mobrovac: ChangeProp: Update config for the new driver [puppet] - 10https://gerrit.wikimedia.org/r/305414 (owner: 10Ppchelko) [11:22:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [11:22:36] _joe_: ^ [11:23:18] RECOVERY - puppet last run on mw2244 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [11:25:01] (03CR) 10Giuseppe Lavagetto: [C: 032] ChangeProp: Update config for the new driver [puppet] - 10https://gerrit.wikimedia.org/r/305414 (owner: 10Ppchelko) [11:25:09] <_joe_> let's go [11:26:46] <_joe_> merging and running puppet on scb2001/sca2001 [11:27:01] (03CR) 10Gehel: [C: 032] logstash: new input for msgpack over UDP [puppet] - 10https://gerrit.wikimedia.org/r/306081 (https://phabricator.wikimedia.org/T143172) (owner: 10BryanDavis) [11:27:12] (03PS2) 10Gehel: logstash: new input for msgpack over UDP [puppet] - 10https://gerrit.wikimedia.org/r/306081 (https://phabricator.wikimedia.org/T143172) (owner: 10BryanDavis) [11:27:37] Damn, that +2 button is too big... [11:28:59] <_joe_> mobrovac: uhm changeprop is not on scb2*? [11:29:10] wat? [11:29:32] it's there [11:29:32] <_joe_> there were no puppet changes on scb2001... [11:30:12] _joe_: maybe puppet ran before you started it? I can see the packages were installed [11:30:23] <_joe_> uhm let me check [11:30:24] ii librdkafka++1:amd64 0.9.1-1~jessie1 [11:30:41] <_joe_> yes [11:30:59] <_joe_> it was running when I did puppet-merge, and was running while I disabled puppet... [11:31:11] (03CR) 10KartikMistry: "recheck" [debs/contenttranslation/apertium-en-es] - 10https://gerrit.wikimedia.org/r/294314 (https://phabricator.wikimedia.org/T107306) (owner: 10KartikMistry) [11:31:12] heh [11:31:14] race [11:31:14] <_joe_> heh say a race condition [11:31:43] <_joe_> ok, let's go on [11:34:00] <_joe_> mobrovac: puppet ran everywhere [11:34:09] kk, restarting [11:36:17] (03CR) 10Gehel: [C: 031] Disable unprivileged user namespaces on trusty systems [puppet] - 10https://gerrit.wikimedia.org/r/304474 (https://phabricator.wikimedia.org/T142567) (owner: 10Muehlenhoff) [11:39:31] _joe_: kk, looking good [11:39:33] _joe_: thnx! [11:41:40] (03PS1) 10Ema: cache_upload: switch to file storage backend on Varnish 4 [puppet] - 10https://gerrit.wikimedia.org/r/306180 (https://phabricator.wikimedia.org/T142810) [11:41:48] 06Operations, 10vm-requests, 13Patch-For-Review, 05Prometheus-metrics-monitoring: eqiad/codfw: 4 VM request for prometheus - https://phabricator.wikimedia.org/T136313#2574943 (10fgiunchedi) 05Open>03Resolved all four VMs provisioned, resolving, thanks @akosiaris ! [11:55:34] (03PS1) 10Ema: standard_packages: add moreutils [puppet] - 10https://gerrit.wikimedia.org/r/306183 [11:57:07] (03CR) 10Muehlenhoff: [C: 031] standard_packages: add moreutils [puppet] - 10https://gerrit.wikimedia.org/r/306183 (owner: 10Ema) [12:09:08] (03PS1) 10Muehlenhoff: Update to 4.4.19 [debs/linux44] - 10https://gerrit.wikimedia.org/r/306186 [12:11:05] (03CR) 10Muehlenhoff: [C: 032] Update to 4.4.19 [debs/linux44] - 10https://gerrit.wikimedia.org/r/306186 (owner: 10Muehlenhoff) [12:22:33] (03CR) 10Muehlenhoff: [C: 032] wdqs: Limit to domain networks [puppet] - 10https://gerrit.wikimedia.org/r/306027 (owner: 10Muehlenhoff) [12:22:38] (03PS2) 10Muehlenhoff: wdqs: Limit to domain networks [puppet] - 10https://gerrit.wikimedia.org/r/306027 [12:31:36] (03CR) 10Nemo bis: [C: 031] Remove $wgTranslateEC [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306178 (owner: 10Nikerabbit) [12:35:59] 06Operations, 06Discovery, 06Maps, 13Patch-For-Review: Maps - remove multiple JVM versions from maps servers - https://phabricator.wikimedia.org/T142977#2575221 (10Gehel) New osmosis package (`osmosis_0.43.1-3+deb8u1+wmf1_all.deb`) has been uploaded to carbon and upgraded on maps1001. I'll let the osm sync... [12:41:37] (03PS1) 10Hashar: Group0 to 1.28.0-wmf.16 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306192 (https://phabricator.wikimedia.org/T141551) [12:43:56] !log on tin dropping stall versions /srv/mediawiki-staging/php-1.28.0-wmf.{8,9,10} T141551 [12:43:57] T141551: MW-1.28.0-wmf.16 deployment blockers - https://phabricator.wikimedia.org/T141551 [12:44:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:45:06] !log citoid deploying f711219 [12:45:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:52:28] grabbing a snack and will show up for swat window [12:54:57] i think there's nothing for swat [12:59:45] aude: there is :) https://wikitech.wikimedia.org/wiki/Deployments#Tuesday.2C.C2.A0August.C2.A023 [12:59:57] hashar: want to pair on swat today? [13:00:02] (03PS1) 10Faidon Liambotis: smokeping: don't monitor install2001 [puppet] - 10https://gerrit.wikimedia.org/r/306194 [13:00:05] hashar, Dereckson, addshore, and aude: Dear anthropoid, the time has come. Please deploy European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160823T1300). [13:00:05] Glaisher and Dereckson: A patch you scheduled for European Mid-day SWAT(Max 8 patches) is about to be deployed. Please be available during the process. [13:00:06] I mean, on a hangout [13:00:12] Hi, I'm here. [13:01:03] o_O [13:02:32] (03CR) 10Faidon Liambotis: [C: 032] smokeping: don't monitor install2001 [puppet] - 10https://gerrit.wikimedia.org/r/306194 (owner: 10Faidon Liambotis) [13:03:39] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [50.0] [13:05:02] o/ [13:05:15] hashar: want to pair on swat today? (hangout) [13:05:36] snack house took longer than expected [13:05:39] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [13:07:59] Pasted privately the hangout link for those willing to join [13:08:45] Glaisher: lets process your patches :} [13:09:17] (03PS2) 10Hashar: Enable T143073 debug log channel [mediawiki-config] - 10https://gerrit.wikimedia.org/r/305801 (https://phabricator.wikimedia.org/T143073) (owner: 10Glaisher) [13:11:00] (03CR) 10Hashar: [C: 032] Enable T143073 debug log channel [mediawiki-config] - 10https://gerrit.wikimedia.org/r/305801 (https://phabricator.wikimedia.org/T143073) (owner: 10Glaisher) [13:11:02] (03PS1) 10Ottomata: Set up Zookeeper cluster for Druid [puppet] - 10https://gerrit.wikimedia.org/r/306196 (https://phabricator.wikimedia.org/T138263) [13:11:31] (03Merged) 10jenkins-bot: Enable T143073 debug log channel [mediawiki-config] - 10https://gerrit.wikimedia.org/r/305801 (https://phabricator.wikimedia.org/T143073) (owner: 10Glaisher) [13:11:54] (03PS2) 10Ottomata: Set up Zookeeper cluster for Druid [puppet] - 10https://gerrit.wikimedia.org/r/306196 (https://phabricator.wikimedia.org/T138263) [13:12:53] it is on mw1099 [13:13:25] hashar: I can't test it. :) [13:13:30] yeah [13:13:32] let me scap it [13:14:04] Glaisher: syncing [13:14:34] not sure how well the scap is going to be [13:14:40] since I have cut new wmf branches earlier [13:14:52] so maybe it is going to scap the whole crap :( [13:15:56] (03PS5) 10Hashar: Remove English for all groups from $wgTranslateBlacklist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306015 (https://phabricator.wikimedia.org/T124013) (owner: 10Glaisher) [13:16:16] (03CR) 10Giuseppe Lavagetto: [C: 032] Temporarily remove conf1003 from the client etcd list. [dns] - 10https://gerrit.wikimedia.org/r/306157 (owner: 10Giuseppe Lavagetto) [13:16:18] !log hashar@tin Synchronized wmf-config/InitialiseSettings.php: Enable T143073 debug log channel (duration: 02m 19s) [13:16:19] T143073: Fatal error: Argument 1 passed to MessageHandle::__construct() must be an instance of Title, null given - https://phabricator.wikimedia.org/T143073 [13:16:21] (03PS2) 10Giuseppe Lavagetto: Temporarily remove conf1003 from the client etcd list. [dns] - 10https://gerrit.wikimedia.org/r/306157 [13:16:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:16:37] Glaisher: Nikerabbit then there is the Translate related patch https://gerrit.wikimedia.org/r/#/c/306015/ [13:16:53] is that something you can test via mw1099 ? [13:17:23] I don't think there is an easy way to check it on the UI [13:17:37] Looks like the actual source language error is shown prior to checking the blacklist [13:17:43] I dont even know where Translate is installed [13:17:54] (03CR) 10Hashar: [C: 032] Remove English for all groups from $wgTranslateBlacklist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306015 (https://phabricator.wikimedia.org/T124013) (owner: 10Glaisher) [13:18:13] Meta, Wikidata, mw.org and other multi-lingual wikis ;) [13:18:19] (03Merged) 10jenkins-bot: Remove English for all groups from $wgTranslateBlacklist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306015 (https://phabricator.wikimedia.org/T124013) (owner: 10Glaisher) [13:18:25] (03PS1) 10ArielGlenn: scheduler: make json default object handler do something a bit more useful [dumps] - 10https://gerrit.wikimedia.org/r/306197 [13:18:54] hashar: Looks like I can test it at testwiki [13:19:06] Page language can be changed there [13:19:22] (03CR) 10Filippo Giunchedi: [C: 031] standard_packages: add moreutils [puppet] - 10https://gerrit.wikimedia.org/r/306183 (owner: 10Ema) [13:19:29] pulled on mw1099 [13:19:59] hashar: It's working [13:20:03] neat [13:20:07] https://test.wikipedia.org/w/index.php?title=Special:Translate&group=page-Finnish+translation+test&language=en&action=page&filter= previously showed the error [13:20:09] syncing [13:20:31] next is https://gerrit.wikimedia.org/r/#/c/306162/ [13:20:35] (03PS2) 10Hashar: Remove no longer relevant $wgTranslateTasks overrides [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306162 (owner: 10Nikerabbit) [13:20:55] Love the new SWAT process :) [13:21:00] (03PS2) 10ArielGlenn: scheduler: make json default object handler do something a bit more useful [dumps] - 10https://gerrit.wikimedia.org/r/306197 [13:21:01] !log hashar@tin Synchronized wmf-config/CommonSettings.php: Remove English for all groups from $wgTranslateBlacklist T124013 (duration: 00m 54s) [13:21:02] T124013: Do not allow saving translations to blacklisted language codes - https://phabricator.wikimedia.org/T124013 [13:21:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:21:10] Glaisher: ho the process is quite old :} [13:21:11] couldn't test it previously [13:21:17] but a slot at this time of the day is new! [13:21:26] Been a long time since I did SWAT last time [13:21:40] (03CR) 10Hashar: [C: 032] Remove no longer relevant $wgTranslateTasks overrides [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306162 (owner: 10Nikerabbit) [13:21:42] (03CR) 10ArielGlenn: [C: 032] scheduler: make json default object handler do something a bit more useful [dumps] - 10https://gerrit.wikimedia.org/r/306197 (owner: 10ArielGlenn) [13:22:08] (03Merged) 10jenkins-bot: Remove no longer relevant $wgTranslateTasks overrides [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306162 (owner: 10Nikerabbit) [13:22:26] Glaisher: pulled on mw1099 if that can be tested [13:22:32] else I will scap [13:22:39] (03PS1) 10ArielGlenn: scheduler: make all except clauses catch specific exceptions [dumps] - 10https://gerrit.wikimedia.org/r/306198 [13:22:39] hashar: tested. works [13:22:42] neat [13:22:46] (at least the only one which I can check) [13:22:53] 'optional' [13:23:11] <_joe_> !log restarted pybal on lvs1011 for testing with etcd reboots [13:23:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:23:17] Glaisher: sounds good [13:23:31] so I think we are done with your three patches :) [13:23:31] (03CR) 10ArielGlenn: [C: 032] scheduler: make all except clauses catch specific exceptions [dumps] - 10https://gerrit.wikimedia.org/r/306198 (owner: 10ArielGlenn) [13:23:44] hashar: Thanks a lot. I got to leave now. [13:23:50] going to deploy Dereckson one [wmf15] 306190 Run LinksDeletionUpdate after commit() in namespaceDupes.php [13:23:54] Glaisher: see you another time and Danke ! [13:23:56] !log hashar@tin Synchronized wmf-config/CommonSettings.php: Remove no longer relevant $wgTranslateTasks overrides (duration: 00m 48s) [13:24:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:25:08] 06Operations, 10ops-eqiad, 06DC-Ops, 06Labs, and 2 others: labstore1002 issues while trying to reboot - https://phabricator.wikimedia.org/T98183#2575363 (10chasemp) [13:25:10] 06Operations, 10ops-eqiad, 06DC-Ops, 06Labs, 10Labs-Infrastructure: Locate and assign some MD1200 shelves for proper testing of labstore1002 - https://phabricator.wikimedia.org/T101741#2575360 (10chasemp) 05Open>03Resolved a:03chasemp >>! In T101741#2574544, @yuvipanda wrote: > Did this happen? Doe... [13:25:23] zeljkof: git lg HEAD..HEAD@{u} [13:25:30] (03PS1) 10ArielGlenn: scheduler: move methods out into module that don't need 'self' [dumps] - 10https://gerrit.wikimedia.org/r/306200 [13:25:57] !log rebooting conf1003 for kernel update [13:26:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:26:31] (03CR) 10ArielGlenn: [C: 032] scheduler: move methods out into module that don't need 'self' [dumps] - 10https://gerrit.wikimedia.org/r/306200 (owner: 10ArielGlenn) [13:27:31] (03PS2) 10Ema: standard_packages: add moreutils [puppet] - 10https://gerrit.wikimedia.org/r/306183 [13:27:41] (03CR) 10Ema: [C: 032 V: 032] standard_packages: add moreutils [puppet] - 10https://gerrit.wikimedia.org/r/306183 (owner: 10Ema) [13:27:48] !log hashar@tin Synchronized php-1.28.0-wmf.15/maintenance/namespaceDupes.php: Run LinksDeletionUpdate after commit() in namespaceDupes.php T143631 (duration: 00m 52s) [13:27:49] T143631: namespaceDupes.php --merge can throw a DBUnexpectedError DatabaseBase::{closure}: Flushing an explicit transaction, getting out of sync! exception - https://phabricator.wikimedia.org/T143631 [13:27:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:28:28] !log European SWAT deploy is complete [13:28:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:28:38] (03PS1) 10ArielGlenn: scheduler: move cache related methods into a class [dumps] - 10https://gerrit.wikimedia.org/r/306201 [13:28:42] Hey, we could have tested that, running namespacesDupe on mw1099. I've one still pending, that's why I cherry-picked the fix. [13:28:49] * Dereckson tests in on Terbium [13:29:13] Works like a charm [13:29:32] Thanks AaronSchulz for the quick fix. [13:29:36] \O/ [13:31:58] (03PS1) 10Filippo Giunchedi: hieradata: add prometheus nodes for eqiad [puppet] - 10https://gerrit.wikimedia.org/r/306202 [13:34:00] 06Operations, 10Continuous-Integration-Infrastructure: OSError: [Errno 28] No space left on device on compiler02.puppet3-diffs.eqiad.wmflabs - https://phabricator.wikimedia.org/T143671#2575396 (10ema) [13:34:30] (03CR) 10ArielGlenn: [C: 032] scheduler: move cache related methods into a class [dumps] - 10https://gerrit.wikimedia.org/r/306201 (owner: 10ArielGlenn) [13:34:42] !log Restarting Jenkins [13:34:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:35:23] just wanted to ask is jenkins down :) [13:36:06] (03CR) 10Nikerabbit: [C: 031] "Code looks good. A bit unclear from the bug that they really requested a new namespace." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/293243 (https://phabricator.wikimedia.org/T137200) (owner: 10Dereckson) [13:36:53] !log hashar@tin Started scap: testwiki to php-1.28.0-wmf.16 T141551 [13:36:54] 06Operations, 10Continuous-Integration-Infrastructure: OSError: [Errno 28] No space left on device on compiler02.puppet3-diffs.eqiad.wmflabs - https://phabricator.wikimedia.org/T143671#2575396 (10Ottomata) +1 [13:36:54] T141551: MW-1.28.0-wmf.16 deployment blockers - https://phabricator.wikimedia.org/T141551 [13:36:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:37:12] !log remove some old compilations from puppet compiler filling up compiler02.eqiad.wmflabs disks T143671 [13:37:13] T143671: OSError: [Errno 28] No space left on device on compiler02.puppet3-diffs.eqiad.wmflabs - https://phabricator.wikimedia.org/T143671 [13:37:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:37:51] !log Run namespaceDupes maintenance script on azwiki and azwiktionary (T143580) [13:37:51] T143580: Namespace problem on Azerbaijani Wiktionary - https://phabricator.wikimedia.org/T143580 [13:37:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:38:29] (03PS1) 10ArielGlenn: scheduler: move email related methods into a class [dumps] - 10https://gerrit.wikimedia.org/r/306203 [13:38:42] ema: ottomata looks like godog is cleaning out the puppet compiler machine :) [13:39:14] 06Operations, 10Continuous-Integration-Infrastructure, 10puppet-compiler: OSError: [Errno 28] No space left on device on compiler02.puppet3-diffs.eqiad.wmflabs - https://phabricator.wikimedia.org/T143671#2575460 (10hashar) [13:39:24] I am syncing the new 1.28.0-wmf.16 to test wiki [13:39:26] (03PS1) 10Giuseppe Lavagetto: Add back conf1003, remove conf1001 from the client etcd SRV record [dns] - 10https://gerrit.wikimedia.org/r/306204 [13:39:26] heh I didn't even notice the task before writing here, but yeah there were some old and big compilations taking up 50% of space [13:39:54] <_joe_> godog: there is a large one started by moritzm [13:40:29] great [13:40:47] _joe_: yeah I left that alone, but I suspect it has failed anyways [13:41:09] (03CR) 10Giuseppe Lavagetto: [C: 032] Add back conf1003, remove conf1001 from the client etcd SRV record [dns] - 10https://gerrit.wikimedia.org/r/306204 (owner: 10Giuseppe Lavagetto) [13:42:11] (03CR) 10ArielGlenn: [C: 032] scheduler: move email related methods into a class [dumps] - 10https://gerrit.wikimedia.org/r/306203 (owner: 10ArielGlenn) [13:42:50] PCC was slightly weird with that one, initially I had gotten reports for a few dozen hosts, but then the result page went blank, didn't think of a filled disk [13:43:01] (that was do doublecheck the service_unit patch) [13:47:24] (03PS1) 10Rush: nfs: cleanup nfs-mount-manager.sh references [puppet] - 10https://gerrit.wikimedia.org/r/306205 [13:48:23] !log rebooting conf1001 for kernel update [13:48:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:51:06] (03PS3) 10Rush: tools: Add script that helps manage sge exec nodes [puppet] - 10https://gerrit.wikimedia.org/r/306025 (https://phabricator.wikimedia.org/T134896) (owner: 10Madhuvishy) [13:51:21] (03PS1) 10ArielGlenn: scheduler: make pylint not whine about unused args to signal handler [dumps] - 10https://gerrit.wikimedia.org/r/306207 [13:52:26] (03CR) 10ArielGlenn: [C: 032] scheduler: make pylint not whine about unused args to signal handler [dumps] - 10https://gerrit.wikimedia.org/r/306207 (owner: 10ArielGlenn) [13:53:02] (03CR) 10Rush: [C: 032] tools: Add script that helps manage sge exec nodes [puppet] - 10https://gerrit.wikimedia.org/r/306025 (https://phabricator.wikimedia.org/T134896) (owner: 10Madhuvishy) [13:53:07] (03PS4) 10Rush: tools: Add script that helps manage sge exec nodes [puppet] - 10https://gerrit.wikimedia.org/r/306025 (https://phabricator.wikimedia.org/T134896) (owner: 10Madhuvishy) [13:53:14] (03CR) 10Rush: [V: 032] tools: Add script that helps manage sge exec nodes [puppet] - 10https://gerrit.wikimedia.org/r/306025 (https://phabricator.wikimedia.org/T134896) (owner: 10Madhuvishy) [13:55:26] (03PS2) 10Rush: nfs: cleanup nfs-mount-manager.sh references [puppet] - 10https://gerrit.wikimedia.org/r/306205 [13:55:51] (03PS1) 10Giuseppe Lavagetto: Re-adding conf1001 to the etcd SRV record [dns] - 10https://gerrit.wikimedia.org/r/306209 [13:56:14] (03CR) 10Giuseppe Lavagetto: [C: 032] Re-adding conf1001 to the etcd SRV record [dns] - 10https://gerrit.wikimedia.org/r/306209 (owner: 10Giuseppe Lavagetto) [13:56:22] (03PS1) 10Filippo Giunchedi: prometheus: bump open fds to 32768 [puppet] - 10https://gerrit.wikimedia.org/r/306210 [13:56:48] (03PS2) 10Filippo Giunchedi: prometheus: bump max open fds to 32768 [puppet] - 10https://gerrit.wikimedia.org/r/306210 [13:57:35] (03PS3) 10Rush: nfs: cleanup nfs-mount-manager.sh references [puppet] - 10https://gerrit.wikimedia.org/r/306205 [13:59:07] 66% of apaches synced [13:59:39] (03CR) 10Rush: [C: 032] nfs: cleanup nfs-mount-manager.sh references [puppet] - 10https://gerrit.wikimedia.org/r/306205 (owner: 10Rush) [14:00:09] (03PS1) 10ArielGlenn: scheduler: move slot related stuff out into a class for resource allocation [dumps] - 10https://gerrit.wikimedia.org/r/306211 [14:03:31] (03PS2) 10ArielGlenn: scheduler: move slot related stuff out into a class for resource allocation [dumps] - 10https://gerrit.wikimedia.org/r/306211 [14:04:24] (03CR) 10Giuseppe Lavagetto: [C: 032] Set up labs realm (ldap classifier and hiera) (031 comment) [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/297902 (https://phabricator.wikimedia.org/T97081) (owner: 10Merlijn van Deen) [14:04:36] (03CR) 10Giuseppe Lavagetto: "very good job, thanks!" [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/297902 (https://phabricator.wikimedia.org/T97081) (owner: 10Merlijn van Deen) [14:05:21] (03CR) 10ArielGlenn: [C: 032] scheduler: move slot related stuff out into a class for resource allocation [dumps] - 10https://gerrit.wikimedia.org/r/306211 (owner: 10ArielGlenn) [14:05:39] !log hashar@tin Finished scap: testwiki to php-1.28.0-wmf.16 T141551 (duration: 28m 45s) [14:05:39] T141551: MW-1.28.0-wmf.16 deployment blockers - https://phabricator.wikimedia.org/T141551 [14:05:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:06:06] (03PS1) 10ArielGlenn: scheduler: move generation of unique id string out to the module [dumps] - 10https://gerrit.wikimedia.org/r/306212 [14:06:31] (03CR) 10Filippo Giunchedi: [C: 032] hieradata: add prometheus nodes for eqiad [puppet] - 10https://gerrit.wikimedia.org/r/306202 (owner: 10Filippo Giunchedi) [14:06:37] (03PS2) 10Filippo Giunchedi: hieradata: add prometheus nodes for eqiad [puppet] - 10https://gerrit.wikimedia.org/r/306202 [14:08:49] (03CR) 10Filippo Giunchedi: [C: 032] prometheus: bump max open fds to 32768 [puppet] - 10https://gerrit.wikimedia.org/r/306210 (owner: 10Filippo Giunchedi) [14:08:54] (03PS3) 10Filippo Giunchedi: prometheus: bump max open fds to 32768 [puppet] - 10https://gerrit.wikimedia.org/r/306210 [14:09:03] (03Merged) 10jenkins-bot: Set up labs realm (ldap classifier and hiera) [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/297902 (https://phabricator.wikimedia.org/T97081) (owner: 10Merlijn van Deen) [14:10:55] !log cleanup list of banned node from elasticsearch eqiad cluster [14:11:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:13:32] !log rebooting conf1002 for kernel update [14:13:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:17:37] 06Operations, 13Patch-For-Review: Switch to Linux 3.19 by default on jessie hosts - https://phabricator.wikimedia.org/T100773#2575692 (10MoritzMuehlenhoff) 05Open>03Resolved All jessie systems are now running 3.19 (T131928 for moving to 4.4) or 4.4. [14:19:48] PROBLEM - puppet last run on bast1001 is CRITICAL: CRITICAL: puppet fail [14:23:30] 06Operations, 10ops-eqiad, 06Discovery, 06Discovery-Search-Backlog, 10Elasticsearch: Improve balance of nodes across rows for elasticsearch cluster eqiad - https://phabricator.wikimedia.org/T143685#2575726 (10Gehel) [14:24:48] 06Operations, 10ops-eqiad, 06Discovery, 06Discovery-Search-Backlog, 10Elasticsearch: Improve balance of nodes across rows for elasticsearch cluster eqiad - https://phabricator.wikimedia.org/T143685#2575745 (10Gehel) @Cmjohnson I should have checked the row allocation better when we received the new elast... [14:25:28] !log installing libgd security updates [14:25:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:27:04] (03PS2) 10ArielGlenn: scheduler: move generation of unique id string out to the module [dumps] - 10https://gerrit.wikimedia.org/r/306212 [14:28:06] (03CR) 10ArielGlenn: [C: 032] scheduler: move generation of unique id string out to the module [dumps] - 10https://gerrit.wikimedia.org/r/306212 (owner: 10ArielGlenn) [14:28:29] (03PS1) 10Hashar: contint: use require_package for php5 [puppet] - 10https://gerrit.wikimedia.org/r/306214 [14:29:10] (03PS1) 10ArielGlenn: scheduler: split up the check for running processses [dumps] - 10https://gerrit.wikimedia.org/r/306215 [14:30:45] (03CR) 10Hashar: "Cherry picked on CI puppet master :) and that fix puppet:" [puppet] - 10https://gerrit.wikimedia.org/r/306214 (owner: 10Hashar) [14:31:11] moritzm: got a puppet patch for you to prevent puppet duplicate definition of php5-dev on CI : https://gerrit.wikimedia.org/r/#/c/306214/ [14:31:22] applied on labs and works [14:31:51] hashar: ok, I'll have a look in half an hour or so [14:32:56] (03PS1) 10Kaldari: Switching Swedish Wikipedia to uca-sv-u-kn collation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306216 (https://phabricator.wikimedia.org/T142113) [14:33:49] 06Operations, 10ops-codfw, 10DBA: es2004 has a dead disk, but it is not under warranty - https://phabricator.wikimedia.org/T143220#2575752 (10jcrespo) Right now, es2004 still shows: ``` CRITICAL: 1 failed LD(s) (Degraded) ``` ``` Enclosure Device ID: 32 Slot Number: 10 Drive's position: DiskGroup: 0, Span:... [14:36:27] (03CR) 10Muehlenhoff: [C: 032] "Looks fine" [puppet] - 10https://gerrit.wikimedia.org/r/301627 (owner: 10Muehlenhoff) [14:36:33] (03PS3) 10Muehlenhoff: contint::firewall: Limit to production networks [puppet] - 10https://gerrit.wikimedia.org/r/301627 [14:37:07] (03CR) 10Muehlenhoff: contint::firewall: Limit to production networks [puppet] - 10https://gerrit.wikimedia.org/r/301627 (owner: 10Muehlenhoff) [14:37:51] (03CR) 10Muehlenhoff: [C: 032] "Looks good, also doublechecked with PCC" [puppet] - 10https://gerrit.wikimedia.org/r/306214 (owner: 10Hashar) [14:39:03] (03CR) 10Filippo Giunchedi: base/monitoring: add optional SMART disk check (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/304580 (https://phabricator.wikimedia.org/T86552) (owner: 10Dzahn) [14:39:28] 06Operations, 10Ops-Access-Requests, 10Analytics: Add analytics team members to group aqs-admins to be able to deploy pageview APi - https://phabricator.wikimedia.org/T142101#2575762 (10Andrew) a:03Ottomata this was discussed in the meeting, approved, and delegated to Andrew O. [14:39:54] Do we have anyone left in this channel who can change the topic? [14:40:38] let me try [14:42:00] (03CR) 10Dzahn: [C: 032] DNS: Add mgmt and production DNS for wqds200[12] Bug:T142864 [dns] - 10https://gerrit.wikimedia.org/r/306056 (https://phabricator.wikimedia.org/T142864) (owner: 10Papaul) [14:42:04] (03PS3) 10Dzahn: DNS: Add mgmt and production DNS for wqds200[12] Bug:T142864 [dns] - 10https://gerrit.wikimedia.org/r/306056 (https://phabricator.wikimedia.org/T142864) (owner: 10Papaul) [14:42:07] nope, getting an error message that I need to be channel op, that used to be different when I had clinic duty for the last time a few weeks ago [14:42:46] I think there was a security lock-down and we left the keys in the car. [14:44:18] RECOVERY - puppet last run on bast1001 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [14:46:03] \o/ that worked [14:46:34] but yeah I remember not being necessary, maybe the channel didn't use to be +t [14:47:46] (03PS1) 10Dzahn: fix wqds -> wdqs name typos [dns] - 10https://gerrit.wikimedia.org/r/306218 [14:50:28] (03PS2) 10Dzahn: fix wqds -> wdqs name typos [dns] - 10https://gerrit.wikimedia.org/r/306218 [14:50:49] godog: that is what happened, it was always mode -t until recently, because vandalism [14:50:59] (03PS3) 10Aklapper: Clarify string in weekly Phabricator Project email [puppet] - 10https://gerrit.wikimedia.org/r/303500 (https://phabricator.wikimedia.org/T142347) [14:51:00] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [50.0] [14:52:06] (03PS1) 10Hashar: nodepool: bump nova client and openstack CLI [puppet] - 10https://gerrit.wikimedia.org/r/306220 (https://phabricator.wikimedia.org/T137217) [14:52:48] (03CR) 10Dzahn: [C: 032] "WikiDataQueryService" [dns] - 10https://gerrit.wikimedia.org/r/306218 (owner: 10Dzahn) [14:53:07] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [14:53:47] any idea why Phabricator #blocked-on-operations has been archived by andre__ on august 12th ? https://phabricator.wikimedia.org/project/profile/937/ [14:53:59] mutante: ah ok, thanks! I'm for restoring -t unless vandalism happens very frequently [14:54:26] godog: yep, agree [14:54:30] looks like all the #blocked-on got archived bah [14:55:54] thanks godog [14:56:14] 06Operations, 10ops-codfw, 06Discovery: rack/setup/deploy wdqs200[12] - https://phabricator.wikimedia.org/T142864#2575814 (10Dzahn) [14:57:39] 06Operations, 10ops-eqiad, 06Discovery, 06Discovery-Search-Backlog, 10Elasticsearch: Improve balance of nodes across rows for elasticsearch cluster eqiad - https://phabricator.wikimedia.org/T143685#2575726 (10faidon) >>! In T143685#2575745, @Gehel wrote: > Am I forgetting anything here? Is it possible to... [14:57:52] 06Operations, 10ops-codfw, 06Discovery: rack/setup/deploy wdqs200[12] - https://phabricator.wikimedia.org/T142864#2549225 (10Dzahn) When merging a DNS change i noticed last moment there is a typo throughout this ticket, in the title itself and from there to other places. It's "wdqs" not "wqds". renamed ticke... [14:58:53] hashar, see https://phabricator.wikimedia.org/T142734 [14:59:41] (03PS2) 10ArielGlenn: scheduler: split up the check for running processses [dumps] - 10https://gerrit.wikimedia.org/r/306215 [15:01:09] 06Operations, 10ops-codfw, 06Discovery: codfw: rack/setup/deploy wdqs200[12]switch configuration - https://phabricator.wikimedia.org/T143613#2575829 (10Papaul) [15:02:37] (03CR) 10ArielGlenn: [C: 032] scheduler: split up the check for running processses [dumps] - 10https://gerrit.wikimedia.org/r/306215 (owner: 10ArielGlenn) [15:02:44] (03PS1) 10Rush: nfs-mount-manager: fix issues with clean and notes [puppet] - 10https://gerrit.wikimedia.org/r/306221 [15:03:07] (03CR) 10jenkins-bot: [V: 04-1] nfs-mount-manager: fix issues with clean and notes [puppet] - 10https://gerrit.wikimedia.org/r/306221 (owner: 10Rush) [15:03:20] (03PS2) 10Rush: nfs-mount-manager: fix issues with clean and notes [puppet] - 10https://gerrit.wikimedia.org/r/306221 [15:03:53] np andrewbogott ! [15:04:14] (03PS3) 10Rush: nfs-mount-manager: fix issues with clean and notes [puppet] - 10https://gerrit.wikimedia.org/r/306221 [15:04:32] (03PS4) 10Rush: nfs-mount-manager: fix issues with clean and notes [puppet] - 10https://gerrit.wikimedia.org/r/306221 [15:07:00] andre__: thank you :) [15:08:04] yw [15:09:02] (03CR) 10Hashar: [C: 031] "contint::firewall is only for gallium / contint1001 and indeed http should only be accessed locally or from the misc varnish. Lets deplo" [puppet] - 10https://gerrit.wikimedia.org/r/301627 (owner: 10Muehlenhoff) [15:10:37] 06Operations, 10ops-eqiad, 06Discovery, 06Discovery-Search-Backlog, 10Elasticsearch: Improve balance of nodes across rows for elasticsearch cluster eqiad - https://phabricator.wikimedia.org/T143685#2575856 (10Gehel) Thanks @faidon! I suspect there is also some switch configuration, update to racktables,... [15:15:49] (03CR) 10Madhuvishy: [C: 032] nfs-mount-manager: fix issues with clean and notes [puppet] - 10https://gerrit.wikimedia.org/r/306221 (owner: 10Rush) [15:17:00] (03PS1) 10ArielGlenn: scheduler: move process result checking out to a class [dumps] - 10https://gerrit.wikimedia.org/r/306224 [15:17:36] (03PS1) 10Muehlenhoff: Provide a systemd override unit for hhvm [puppet] - 10https://gerrit.wikimedia.org/r/306225 (https://phabricator.wikimedia.org/T143210) [15:19:01] 06Operations, 06Operations-Software-Development: Evaluation of automation/orchestration tools - https://phabricator.wikimedia.org/T143306#2563658 (10Volans) [15:20:58] (03CR) 10Rush: nfs: Modify /data/scratch on nfs clients to point to mount from labstore1003 (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/306019 (https://phabricator.wikimedia.org/T134896) (owner: 10Madhuvishy) [15:22:20] (03CR) 10ArielGlenn: [C: 032] scheduler: move process result checking out to a class [dumps] - 10https://gerrit.wikimedia.org/r/306224 (owner: 10ArielGlenn) [15:29:17] (03PS1) 10ArielGlenn: scheduler: change scheduler class to take dict of plugables [dumps] - 10https://gerrit.wikimedia.org/r/306226 [15:30:16] (03PS2) 10Ottomata: Create deploy-aqs group [puppet] - 10https://gerrit.wikimedia.org/r/304020 (https://phabricator.wikimedia.org/T142101) (owner: 10Alexandros Kosiaris) [15:31:18] (03PS2) 10ArielGlenn: scheduler: change scheduler class to take dict of plugables [dumps] - 10https://gerrit.wikimedia.org/r/306226 [15:31:49] (03PS3) 10Ottomata: Create deploy-aqs group [puppet] - 10https://gerrit.wikimedia.org/r/304020 (https://phabricator.wikimedia.org/T142101) (owner: 10Alexandros Kosiaris) [15:34:06] (03CR) 10ArielGlenn: [C: 032] scheduler: change scheduler class to take dict of plugables [dumps] - 10https://gerrit.wikimedia.org/r/306226 (owner: 10ArielGlenn) [15:34:54] (03PS1) 10ArielGlenn: move id string setup into a module-level function [dumps] - 10https://gerrit.wikimedia.org/r/306227 [15:34:58] 06Operations, 10MediaWiki-extensions-CentralNotice, 10Traffic, 13Patch-For-Review: CN: Stop using the geoiplookup HTTPS service (always use the Cookie) - https://phabricator.wikimedia.org/T143271#2562534 (10awight) @AndyRussG There's a deadline for this, roughly Aug 23. It sounds like the next steps are,... [15:36:29] 06Operations, 10MediaWiki-extensions-CentralNotice, 10Traffic, 03Fundraising Sprint Pretending This Isn't Happening, and 2 others: CN: Stop using the geoiplookup HTTPS service (always use the Cookie) - https://phabricator.wikimedia.org/T143271#2575919 (10awight) p:05Low>03Unbreak! [15:36:37] (03CR) 10ArielGlenn: [C: 032] move id string setup into a module-level function [dumps] - 10https://gerrit.wikimedia.org/r/306227 (owner: 10ArielGlenn) [15:38:18] (03PS1) 10ArielGlenn: convert options and flags into dict for easier handling [dumps] - 10https://gerrit.wikimedia.org/r/306228 [15:40:45] (03CR) 10ArielGlenn: [C: 032] convert options and flags into dict for easier handling [dumps] - 10https://gerrit.wikimedia.org/r/306228 (owner: 10ArielGlenn) [15:41:47] (03PS1) 10ArielGlenn: scheduler: move plugables setup out into a module function [dumps] - 10https://gerrit.wikimedia.org/r/306229 [15:43:42] (03CR) 10Ottomata: [C: 032] Create deploy-aqs group [puppet] - 10https://gerrit.wikimedia.org/r/304020 (https://phabricator.wikimedia.org/T142101) (owner: 10Alexandros Kosiaris) [15:46:06] (03CR) 10Dzahn: [C: 032] Clarify string in weekly Phabricator Project email [puppet] - 10https://gerrit.wikimedia.org/r/303500 (https://phabricator.wikimedia.org/T142347) (owner: 10Aklapper) [15:46:11] (03PS4) 10Dzahn: Clarify string in weekly Phabricator Project email [puppet] - 10https://gerrit.wikimedia.org/r/303500 (https://phabricator.wikimedia.org/T142347) (owner: 10Aklapper) [15:46:19] 06Operations, 10Ops-Access-Requests, 10Analytics: Add analytics team members to group aqs-admins to be able to deploy pageview APi - https://phabricator.wikimedia.org/T142101#2575927 (10Ottomata) 05stalled>03Resolved [15:46:53] 06Operations, 10Wikimedia-Logstash, 03Discovery-Search-Sprint: Elasticsearch restarts are failing in the logstash cluster - https://phabricator.wikimedia.org/T142357#2575928 (10Gehel) Checking the different segments, all have been upgraded to lucene 5.5.0 except the `grafana-dashboards` and `kibana-int` indi... [15:47:10] (03CR) 10ArielGlenn: [C: 032] scheduler: move plugables setup out into a module function [dumps] - 10https://gerrit.wikimedia.org/r/306229 (owner: 10ArielGlenn) [15:47:23] (03PS6) 10Madhuvishy: nfs: Modify /data/scratch on nfs clients to point to mount from labstore1003 [puppet] - 10https://gerrit.wikimedia.org/r/306019 (https://phabricator.wikimedia.org/T134896) [15:47:30] (03CR) 10Dzahn: [V: 032] Clarify string in weekly Phabricator Project email [puppet] - 10https://gerrit.wikimedia.org/r/303500 (https://phabricator.wikimedia.org/T142347) (owner: 10Aklapper) [15:48:33] (03PS1) 10ArielGlenn: scheduler: move log level setup out to a module function [dumps] - 10https://gerrit.wikimedia.org/r/306230 [15:48:35] (03CR) 10jenkins-bot: [V: 04-1] nfs: Modify /data/scratch on nfs clients to point to mount from labstore1003 [puppet] - 10https://gerrit.wikimedia.org/r/306019 (https://phabricator.wikimedia.org/T134896) (owner: 10Madhuvishy) [15:51:49] (03CR) 10ArielGlenn: [C: 032] scheduler: move log level setup out to a module function [dumps] - 10https://gerrit.wikimedia.org/r/306230 (owner: 10ArielGlenn) [15:53:00] (03PS1) 10ArielGlenn: scheduler: move setup of stdin for commands out to a module function [dumps] - 10https://gerrit.wikimedia.org/r/306231 [15:56:10] (03CR) 10ArielGlenn: [C: 032] scheduler: move setup of stdin for commands out to a module function [dumps] - 10https://gerrit.wikimedia.org/r/306231 (owner: 10ArielGlenn) [15:57:46] (03PS7) 10Madhuvishy: nfs: Modify /data/scratch on nfs clients to point to mount from labstore1003 [puppet] - 10https://gerrit.wikimedia.org/r/306019 (https://phabricator.wikimedia.org/T134896) [15:58:16] 06Operations, 10ops-codfw, 10DBA: es2004 has a dead disk, but it is not under warranty - https://phabricator.wikimedia.org/T143220#2575934 (10Papaul) @jcrespo the disk is a brain new disk that was in a static plastic bag nerve used . [16:00:01] (03PS1) 10ArielGlenn: scheduler: move population of dict for opts out to module function [dumps] - 10https://gerrit.wikimedia.org/r/306233 [16:00:04] godog, moritzm, and _joe_: Respected human, time to deploy Puppet SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160823T1600). Please do the needful. [16:00:04] thcipriani and ostriches: A patch you scheduled for Puppet SWAT(Max 8 patches) is about to be deployed. Please be available during the process. [16:00:05] 06Operations, 10ops-codfw, 06Discovery: codfw: rack/setup/deploy wdqs200[12]switch configuration - https://phabricator.wikimedia.org/T143613#2573416 (10akosiaris) @papaul Switch port descriptions and configuration updated. Both ports placed in the corresponding private vlan. [16:00:14] o/ [16:00:51] o/ [16:02:25] 06Operations, 10ops-codfw, 10DBA: es2004 has a dead disk, but it is not under warranty - https://phabricator.wikimedia.org/T143220#2575943 (10jcrespo) I believe you, I am just copying and pasting: "Firmware state: Failed" Either you changed the wrong disk- I *do not* believe that, the serial number seems di... [16:04:01] (03CR) 10ArielGlenn: [C: 032] scheduler: move population of dict for opts out to module function [dumps] - 10https://gerrit.wikimedia.org/r/306233 (owner: 10ArielGlenn) [16:05:03] 06Operations, 10ops-codfw, 10DBA: es2004 has a dead disk, but it is not under warranty - https://phabricator.wikimedia.org/T143220#2575954 (10jcrespo) There is a more likely possibility- the controller has a problem with that particular port- the controler before didn't failed as usual, some if its informati... [16:09:30] 10Blocked-on-Operations, 06Operations, 10Recommendation-API: Backport python3-sklearn and python3-sklearn-lib from sid - https://phabricator.wikimedia.org/T133362#2575967 (10leila) p:05Normal>03Low [16:10:07] 06Operations, 10ops-codfw, 10DBA: es2004 has a dead disk, but it is not under warranty - https://phabricator.wikimedia.org/T143220#2561047 (10Volans) @jcrespo true but now it returns immediately, so maybe it was just not recognized? Maybe you could try to unplug it and plug it again. ``` es2004 0 ~$ time... [16:10:09] 10Blocked-on-Operations, 06Operations, 10Recommendation-API: Backport python3-sklearn and python3-sklearn-lib from sid - https://phabricator.wikimedia.org/T133362#2229360 (10leila) Per discussions in backlog grooming, there is no dependency on sklearn at the moment, however, for future experiments and develo... [16:10:13] (03PS2) 10Andrew Bogott: Nova: update api-paste.ini.erb to conform with Liberty defaults [puppet] - 10https://gerrit.wikimedia.org/r/303434 [16:10:55] (03CR) 10BBlack: [C: 031] cache_upload: switch to file storage backend on Varnish 4 [puppet] - 10https://gerrit.wikimedia.org/r/306180 (https://phabricator.wikimedia.org/T142810) (owner: 10Ema) [16:11:53] (03PS2) 10Andrew Bogott: labspuppetbackend: Switch from json to yaml [puppet] - 10https://gerrit.wikimedia.org/r/303954 (owner: 10Yuvipanda) [16:12:30] thcipriani: Just you and me I guess :) [16:12:34] I can +1 if you +1 mine :p [16:13:27] heh, producitive puppet swat :) [16:13:37] 10Blocked-on-Operations, 06Operations, 10Recommendation-API: Backport python3-sklearn and python3-sklearn-lib from sid - https://phabricator.wikimedia.org/T133362#2229360 (10yuvipanda) I also think deb packaging for this is going town a long, unrecoverable rabbit hole, and would recommend a wheels setup simi... [16:14:37] (03CR) 10Andrew Bogott: [C: 032] labspuppetbackend: Switch from json to yaml [puppet] - 10https://gerrit.wikimedia.org/r/303954 (owner: 10Yuvipanda) [16:14:42] think I already +1'd 1 of yours at least :) [16:15:44] lol prolly [16:15:45] (03CR) 10Chad: [C: 031] scap: bump version to 3.2.3-1 [puppet] - 10https://gerrit.wikimedia.org/r/305078 (owner: 10Thcipriani) [16:15:55] (03CR) 10Chad: [C: 031] Beta: Add logstash port [puppet] - 10https://gerrit.wikimedia.org/r/303240 (owner: 10Thcipriani) [16:16:06] (03CR) 10Chad: [C: 031] Add the fatalmonitor query to logstash_checker [puppet] - 10https://gerrit.wikimedia.org/r/304327 (https://phabricator.wikimedia.org/T142784) (owner: 10Thcipriani) [16:16:10] :D [16:16:36] 06Operations, 10ops-eqiad, 06Discovery, 06Discovery-Search-Backlog, 10Elasticsearch: Improve balance of nodes across rows for elasticsearch cluster eqiad - https://phabricator.wikimedia.org/T143685#2575999 (10faidon) @gehel, yes, that is correct. Typically we reprovision servers when we move them around;... [16:20:32] 06Operations, 10scap: Scap::server::sources is out of sync with the repositories actually present on tin/mira - https://phabricator.wikimedia.org/T143692#2576004 (10Joe) [16:21:06] 06Operations, 10scap: Scap::server::sources is out of sync with the repositories actually present on tin/mira - https://phabricator.wikimedia.org/T143692#2576016 (10Joe) p:05Triage>03High [16:21:31] 06Operations, 10scap: Scap::server::sources is out of sync with the repositories actually present on tin/mira - https://phabricator.wikimedia.org/T143692#2576004 (10Joe) [16:24:14] 06Operations, 10Analytics, 06Performance-Team, 10Traffic: Preliminary Design document for A/B testing - https://phabricator.wikimedia.org/T143694#2576043 (10Nuria) [16:27:08] 06Operations, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice, 10Traffic, and 3 others: CN: Stop using the geoiplookup HTTPS service (always use the Cookie) - https://phabricator.wikimedia.org/T143271#2576065 (10awight) Adding #fundraising-backlog so that this task makes it into our prioritizat... [16:27:34] !log rebooting es2004 for hardware maintenance T143220 [16:27:35] T143220: es2004 has a dead disk, but it is not under warranty - https://phabricator.wikimedia.org/T143220 [16:27:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:29:55] 06Operations, 10ops-eqiad, 06Discovery, 06Discovery-Search-Backlog, 10Elasticsearch: Improve balance of nodes across rows for elasticsearch cluster eqiad - https://phabricator.wikimedia.org/T143685#2576084 (10Gehel) Reprovisionning might make sense (and is sufficiently automated on elasticsearch to be pa... [16:31:14] 06Operations, 10scap, 03Scap3, 15User-mobrovac: Scap::server::sources is out of sync with the repositories actually present on tin/mira - https://phabricator.wikimedia.org/T143692#2576089 (10mobrovac) [16:33:41] 06Operations, 10ops-eqiad, 06Discovery, 06Discovery-Search-Backlog, 10Elasticsearch: Improve balance of nodes across rows for elasticsearch cluster eqiad - https://phabricator.wikimedia.org/T143685#2576128 (10Gehel) [16:38:54] (03PS2) 10Ema: cache_upload: switch to file storage backend on Varnish 4 [puppet] - 10https://gerrit.wikimedia.org/r/306180 (https://phabricator.wikimedia.org/T142810) [16:39:02] (03CR) 10Ema: [C: 032 V: 032] cache_upload: switch to file storage backend on Varnish 4 [puppet] - 10https://gerrit.wikimedia.org/r/306180 (https://phabricator.wikimedia.org/T142810) (owner: 10Ema) [16:39:19] PROBLEM - Disk space on scb1001 is CRITICAL: DISK CRITICAL - free space: / 350 MB (3% inode=84%) [16:39:38] looking ^ [16:39:58] papaul, es2004 should be down by now [16:40:05] waat? [16:40:17] godog: _joe_: moritzm: puppetswat is on? [16:41:27] 06Operations, 10Continuous-Integration-Infrastructure, 10puppet-compiler: OSError: [Errno 28] No space left on device on compiler02.puppet3-diffs.eqiad.wmflabs - https://phabricator.wikimedia.org/T143671#2576152 (10greg) p:05Triage>03High [16:41:45] something changed in the last 2 weeks [16:41:52] 06Operations, 10ops-codfw, 10DBA: es2004 has a dead disk, but it is not under warranty - https://phabricator.wikimedia.org/T143220#2576153 (10jcrespo) Yes, it seems that it may need a reboot + configuration, that is the working thesis now. [16:43:12] 06Operations, 10Continuous-Integration-Infrastructure, 10puppet-compiler: OSError: [Errno 28] No space left on device on compiler02.puppet3-diffs.eqiad.wmflabs - https://phabricator.wikimedia.org/T143671#2575396 (10greg) (Set it to High, but I assume @fgiunchedi fixed it by removing the old compilations.) [16:43:13] ah, so celery is logging at info level [16:43:29] 06Operations: Upgrade mc1* cluster to Linux 4.4 - https://phabricator.wikimedia.org/T143695#2576158 (10MoritzMuehlenhoff) [16:43:31] RECOVERY - Disk space on scb1001 is OK: DISK OK [16:44:03] sigh sorry thcipriani ostriches mobrovac, completely forgot about puppet swat and jouncebot doesn't hilight my nick because $reasons [16:44:22] jynus: ok [16:44:56] I'm about to jump into a meeting in 15", ok to go on to thurs puppet swat unless there's something urgent ? [16:45:17] godog: i'll have a patch in 2 mins that i'd really need before deploying CP, if that's ok [16:46:10] godog: would be good to get the scap bump out since (a) it fixes a pretty bad bug and (b) if a new server spins up puppet will fail currenlty [16:46:16] 06Operations: Upgrade jessie systems from Linux 3.19 to 4.4 - https://phabricator.wikimedia.org/T131928#2576179 (10MoritzMuehlenhoff) Remaining hosts: mc1*: Tracked via T143695 ganeti2001-ganeti2006: Requires to migrate instances to other virt nodes before reboot nescio.wikimedia.org maerlant.wikimedia.org r... [16:46:35] ok, so thcipriani scap update coming up [16:46:46] it wasn't in the Deployments calendar earlier the day? [16:47:02] mobrovac: yeah put it in the calendar [16:47:07] godog: thank you [16:47:10] moritzm: IIRC it got added yesterday [16:47:21] (03PS1) 10Mobrovac: Change Prop: Delay the consumption start for one minute [puppet] - 10https://gerrit.wikimedia.org/r/306243 [16:47:24] for some reason tues puppet swat fell off the calendar [16:47:29] kk godog, thxn, it's ^ [16:47:38] (03PS2) 10Filippo Giunchedi: scap: bump version to 3.2.3-1 [puppet] - 10https://gerrit.wikimedia.org/r/305078 (owner: 10Thcipriani) [16:47:54] maybe my calender page was stale [16:47:58] (03CR) 10BryanDavis: "This has been cherry-picked on deployment-prep for months. Can we figure out what the right change is and get it merged?" [puppet] - 10https://gerrit.wikimedia.org/r/249490 (https://phabricator.wikimedia.org/T116898) (owner: 10Hashar) [16:48:17] (there's also the gcal entry if you want gcal to alert you) [16:49:09] (03CR) 10Filippo Giunchedi: [C: 032] scap: bump version to 3.2.3-1 [puppet] - 10https://gerrit.wikimedia.org/r/305078 (owner: 10Thcipriani) [16:49:56] (03CR) 10BryanDavis: "Cherry-picked on deployment-prep for months. Can we move this forward?" [puppet] - 10https://gerrit.wikimedia.org/r/284852 (https://phabricator.wikimedia.org/T132689) (owner: 1020after4) [16:50:39] greg-g: hah indeed that's a good idea, {{done}} [16:51:42] (03PS2) 10Filippo Giunchedi: Change Prop: Delay the consumption start for one minute [puppet] - 10https://gerrit.wikimedia.org/r/306243 (owner: 10Mobrovac) [16:51:48] thcipriani: {{done}} [16:51:58] godog: \o/ awesome [16:52:02] thanks [16:53:13] (03CR) 10Filippo Giunchedi: [C: 032] Change Prop: Delay the consumption start for one minute [puppet] - 10https://gerrit.wikimedia.org/r/306243 (owner: 10Mobrovac) [16:53:35] thcipriani: np! [16:54:07] mobrovac: {{done}} [16:54:16] * mobrovac checking [16:54:46] running puppet, that is [16:56:07] all good, thnx godog! [16:56:11] appreciate it [16:58:41] np, sorry I missed today's window [17:00:05] yurik, gwicke, cscott, arlolra, subbu, halfak, and Amir1: Dear anthropoid, the time has come. Please deploy Services – Graphoid / Parsoid / OCG / Citoid / ORES (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160823T1700). [17:07:54] I guess I'll bump it to another puppetswat window. [17:07:56] Third time :) [17:09:18] (03PS2) 10BryanDavis: Add output plugin for Sentry [software/logstash/plugins] - 10https://gerrit.wikimedia.org/r/262747 (https://phabricator.wikimedia.org/T85239) (owner: 10Gergő Tisza) [17:09:33] (03CR) 10BryanDavis: [C: 032 V: 031] Add output plugin for Sentry [software/logstash/plugins] - 10https://gerrit.wikimedia.org/r/262747 (https://phabricator.wikimedia.org/T85239) (owner: 10Gergő Tisza) [17:09:54] (03CR) 10BryanDavis: [V: 032] Add output plugin for Sentry [software/logstash/plugins] - 10https://gerrit.wikimedia.org/r/262747 (https://phabricator.wikimedia.org/T85239) (owner: 10Gergő Tisza) [17:19:13] (03CR) 10BryanDavis: "* Deployed in deployment-prep" [software/logstash/plugins] - 10https://gerrit.wikimedia.org/r/262747 (https://phabricator.wikimedia.org/T85239) (owner: 10Gergő Tisza) [17:27:48] bblack: hi! thanks for opening and working on the CN GeoIP bug!! Is there somewhere where I can learn more about what's going on with our setup? Or do you maybe have time for a quick hangout? Thanks in advance!!! [17:27:58] fr-tech ^ [17:29:28] AndyRussG: can u jump into the tech talk hangout... we were just talking about that [17:29:30] AndyRussG: I'm in another meeting at the moment. I can sync up after (on the hour). The TL;DR is the GeoIP cookie here in WMF now supports v6 users (no need for fallback to HTTP request), and we'd like to quickly disable that fallback, at least in WMF config today. [17:30:04] awight knows all though, he can sync you up too :) [17:30:20] bblack: agreed, thanks again! [17:30:49] bblack: ah ok thx!! I'll talk to awight then only ping if we need any more pointers... awight: coming! [17:34:18] 06Operations, 07Puppet, 06Release-Engineering-Team: Preload TestingAccessWrapper in production mwrepl - https://phabricator.wikimedia.org/T143607#2576402 (10greg) [17:35:31] !log changeprop deploying 519ad9d [17:35:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:41:42] 06Operations, 10Wikimedia-Logstash, 03Discovery-Search-Sprint: Elasticsearch restarts are failing in the logstash cluster - https://phabricator.wikimedia.org/T142357#2576450 (10debt) a:03Gehel [17:42:26] (03CR) 10BryanDavis: "Some version of this is cherry-picked on deployment-prep, but the version there and PS3 here don't match." [puppet] - 10https://gerrit.wikimedia.org/r/263024 (https://phabricator.wikimedia.org/T85239) (owner: 10Gergő Tisza) [17:42:45] (03CR) 10Smalyshev: [C: 031] WDQS caching headers [puppet] - 10https://gerrit.wikimedia.org/r/306163 (https://phabricator.wikimedia.org/T137238) (owner: 10Gehel) [17:45:19] 06Operations, 06Operations-Software-Development: Evaluation of automation/orchestration tools - https://phabricator.wikimedia.org/T143306#2576496 (10mmodell) ClusterShell is really nice in many ways. [17:52:26] 06Operations, 10ops-eqiad, 06Discovery, 06Discovery-Search-Backlog, 10Elasticsearch: Improve balance of nodes across rows for elasticsearch cluster eqiad - https://phabricator.wikimedia.org/T143685#2576534 (10Gehel) a:03Gehel [17:53:16] 06Operations, 10ops-eqiad, 06Discovery, 06Discovery-Search-Backlog, and 2 others: Improve balance of nodes across rows for elasticsearch cluster eqiad - https://phabricator.wikimedia.org/T143685#2575726 (10Gehel) [17:53:59] 06Operations, 10ops-eqiad, 06Discovery, 06Discovery-Search-Backlog, and 2 others: Improve balance of nodes across rows for elasticsearch cluster eqiad - https://phabricator.wikimedia.org/T143685#2576545 (10Gehel) [17:55:50] (03PS1) 10Dereckson: Fix UI l10n for Help page link on commons.wikimedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306255 (https://phabricator.wikimedia.org/T143564) [17:58:03] Hello. I've added this change to the morning SWAT ^ [18:00:04] anomie, ostriches, thcipriani, hashar, and twentyafterfour: Dear anthropoid, the time has come. Please deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160823T1800). [18:00:04] MatmaRex and kaldari: A patch you scheduled for Morning SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [18:00:28] new swat time is weird :) [18:00:39] was unexpected :P [18:01:01] hiho [18:01:10] bd808: still confuses me, too [18:01:12] finnaly it doesn't conflict with any meetings for me :P [18:01:16] finally* [18:02:48] I can SWAT today [18:04:30] 06Operations, 10ops-codfw, 10DBA: es2004 has a dead disk, but it is not under warranty - https://phabricator.wikimedia.org/T143220#2576561 (10jcrespo) I've put it down and downtime'd it for a day, @papaul feel free to start it and do anything with it configuration-wise (it is not urgent). [18:04:50] I'm here for my swat [18:04:55] swat away [18:05:46] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "Small nit, but LGTM in general" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/301409 (owner: 10Chad) [18:06:05] kaldari: cool, looks like I'll need to run updateCollation, how long has that been taking for these transitions (out of curiosity) [18:06:20] I can run it if you like... [18:06:54] Swedish has a pretty big categorylinks table, so I'm anticipating it'll take about 3-4 hours [18:07:03] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306216 (https://phabricator.wikimedia.org/T142113) (owner: 10Kaldari) [18:07:34] (03Merged) 10jenkins-bot: Switching Swedish Wikipedia to uca-sv-u-kn collation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306216 (https://phabricator.wikimedia.org/T142113) (owner: 10Kaldari) [18:07:35] kaldari: wowza, yeah, if you could run it that would be nice :) [18:08:18] PROBLEM - puppet last run on dbstore2001 is CRITICAL: CRITICAL: puppet fail [18:08:40] kaldari: assuming there's nothing to check here on mw1099, fine to go live everywhere? [18:09:10] it is live on mw1099 if you'd like to check it there [18:09:31] thcipriani: Yeah, I'll just check that the category pages aren't blowing up [18:09:32] ... [18:09:42] thanks :) [18:10:25] thcipriani: Looks good, let's sync it [18:10:30] ack, doing [18:11:30] (03CR) 10Dzahn: [C: 032] mediawiki: include fonts in role::mediawiki::webserver [puppet] - 10https://gerrit.wikimedia.org/r/231284 (https://phabricator.wikimedia.org/T84777) (owner: 10Dzahn) [18:11:35] (03PS6) 10Dzahn: mediawiki: include fonts in role::mediawiki::webserver [puppet] - 10https://gerrit.wikimedia.org/r/231284 (https://phabricator.wikimedia.org/T84777) [18:12:14] !log thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:306216|Switching Swedish Wikipedia to uca-sv-u-kn collation (T142113)]] (duration: 00m 58s) [18:12:15] T142113: Test numeric sorting on Swedish Wikipedia - https://phabricator.wikimedia.org/T142113 [18:12:18] ^ kaldari live everywhere [18:12:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:12:44] (03PS3) 10Chad: Scap3: Go ahead and `scap deploy --init` a freshly provisioned repo [puppet] - 10https://gerrit.wikimedia.org/r/301409 [18:12:52] (03CR) 10Chad: Scap3: Go ahead and `scap deploy --init` a freshly provisioned repo (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/301409 (owner: 10Chad) [18:13:24] !log mwscript maintenance/updateCollation.php --wiki=svwiki --force [18:13:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:14:08] thcipriani: so far so good [18:14:29] thcipriani: I'll keep an eye on it until it's done [18:14:35] kaldari: glad to hear it, thanks for running that, appreciated :) [18:15:34] 06Operations, 06Operations-Software-Development: Evaluation of automation/orchestration tools - https://phabricator.wikimedia.org/T143306#2563658 (10Joe) I think you should at least look at mcollective (https://docs.puppet.com/mcollective/) Without going into details: the cons are it's created by puppetlabs a... [18:20:17] MatmaRex: your changes are live on mw1099, check please [18:20:22] 06Operations, 10Traffic: Support TLS chacha20-poly1305 AEAD ciphers - https://phabricator.wikimedia.org/T131908#2576654 (10BBlack) Following up a bit with observations now that we've had 2-3 weeks or so of ChaPoly stats to stare at: 1. Draft-mode is on the decline. I expect the timing of OpenSSL-1.1.x stabil... [18:20:27] 06Operations, 13Patch-For-Review: install font packages on all appservers, not just imagescalers (was: Install fonts-wqy-zenhei on all mediawiki app servers) - https://phabricator.wikimedia.org/T84777#2576655 (10Dzahn) fonts are now going to be installed on all appservers. example on mw1261, canary appserver,... [18:20:29] well, check wmf.15, I suppose, I'll sync both after [18:20:47] thcipriani: thanks. it's debug code for an unreproducible bug, so i can't really tell much ;) [18:21:20] heh, kk, I'll go ahead and sync it out. [18:21:31] Reedy: ^ your bug from 2014 is being resolved right now [18:22:56] !log all mw appservers are installing all the font packages now, not just imagescalers. this should fix some issues with EasyTimeline on zh projects and more [18:23:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:23:27] PROBLEM - DPKG on mw1278 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:23:51] ^ that is almost certainly just because it's in the middle of installing [18:24:27] PROBLEM - DPKG on mw1274 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:24:41] !log thcipriani@tin Synchronized php-1.28.0-wmf.16/resources/src/mediawiki.widgets/mw.widgets.CategoryCapsuleItemWidget.js: SWAT: [[gerrit:306235|Debug logging for "queue[title] undefined" (T139130)]] (duration: 00m 50s) [18:24:42] T139130: Uncaught TypeError: Cannot read property 'resolve' of undefined / queue[title] is undefined - https://phabricator.wikimedia.org/T139130 [18:24:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:25:28] RECOVERY - DPKG on mw1278 is OK: All packages OK [18:25:59] PROBLEM - DPKG on mw2068 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:26:05] !log thcipriani@tin Synchronized php-1.28.0-wmf.15/resources/src/mediawiki.widgets/mw.widgets.CategoryCapsuleItemWidget.js: SWAT: [[gerrit:306234|Debug logging for "queue[title] undefined" (T139130)]] (duration: 00m 50s) [18:26:06] T139130: Uncaught TypeError: Cannot read property 'resolve' of undefined / queue[title] is undefined - https://phabricator.wikimedia.org/T139130 [18:26:06] thanks [18:26:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:26:29] RECOVERY - DPKG on mw1274 is OK: All packages OK [18:26:35] intermittent DPKG from mw servers is the font install.. will fix itself [18:28:07] PROBLEM - DPKG on mw1270 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:28:14] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306255 (https://phabricator.wikimedia.org/T143564) (owner: 10Dereckson) [18:28:17] RECOVERY - DPKG on mw2068 is OK: All packages OK [18:28:18] PROBLEM - DPKG on mw2217 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:29:41] Project policy requires all submissions to be a fast-forward. [18:29:47] Manually rebasing. [18:29:47] PROBLEM - DPKG on mw1276 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:30:02] blerg. Been missing that ever since the gerrit update. [18:30:09] PROBLEM - DPKG on mw2073 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:30:14] http://tyler.zone/why-gerrit.gif [18:30:18] RECOVERY - DPKG on mw1270 is OK: All packages OK [18:30:28] RECOVERY - DPKG on mw2217 is OK: All packages OK [18:30:37] (03PS1) 1020after4: WIP: scap swat command [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306259 [18:30:42] 06Operations, 13Patch-For-Review: install font packages on all appservers, not just imagescalers (was: Install fonts-wqy-zenhei on all mediawiki app servers) - https://phabricator.wikimedia.org/T84777#2576682 (10Dzahn) 05Open>03Resolved a:03Dzahn [18:30:45] (03PS2) 10Dereckson: Fix UI l10n for Help page link on commons.wikimedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306255 (https://phabricator.wikimedia.org/T143564) [18:31:09] (03CR) 10Thcipriani: Fix UI l10n for Help page link on commons.wikimedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306255 (https://phabricator.wikimedia.org/T143564) (owner: 10Dereckson) [18:31:16] (03CR) 10Thcipriani: [C: 032] "SWAT again" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306255 (https://phabricator.wikimedia.org/T143564) (owner: 10Dereckson) [18:31:43] (03Merged) 10jenkins-bot: Fix UI l10n for Help page link on commons.wikimedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306255 (https://phabricator.wikimedia.org/T143564) (owner: 10Dereckson) [18:31:58] RECOVERY - DPKG on mw1276 is OK: All packages OK [18:32:35] Dereckson: patch is live on mw1099, check please [18:34:00] A cache nightmare to test. How do you force the sidebar to be regenerated? [18:34:09] PROBLEM - DPKG on mw1202 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:34:23] Dereckson: Purge from memcached? [18:34:28] RECOVERY - DPKG on mw2073 is OK: All packages OK [18:34:28] PROBLEM - DPKG on mw1257 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:34:45] Hi dosent this line https://github.com/wikimedia/operations-mediawiki-config/blob/master/wmf-config/CommonSettings.php#L286 need updating? [18:34:48] PROBLEM - DPKG on mw1212 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:34:53] since the config was removed in mw 1.24 [18:34:57] (it's cached twice, pcache which action=purge will regenerate, but yeah memcached otherwsise) [18:35:15] paladox: Yes, but it's harmless :) [18:35:19] Oh ok [18:35:20] I thought someone already did nuke that tbh [18:35:35] Oh [18:35:42] ostriches what do i do for https://github.com/wikimedia/mediawiki-extensions-ConfirmAccount/blob/32dc740ef0d8233006221d244e98857fb32f7496/ConfirmAccount.config.php#L92 ? [18:35:48] RECOVERY - puppet last run on dbstore2001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [18:35:53] please, since im unsure what to replace and how to replace it [18:35:54] ? [18:36:18] RECOVERY - DPKG on mw1202 is OK: All packages OK [18:36:28] (03PS4) 10Dzahn: Scap3: Go ahead and `scap deploy --init` a freshly provisioned repo [puppet] - 10https://gerrit.wikimedia.org/r/301409 (owner: 10Chad) [18:36:28] Replace...what? I'm confused. [18:36:36] I don't think you need to worry about it :) [18:36:38] RECOVERY - DPKG on mw1257 is OK: All packages OK [18:36:38] Replace $wgFileStore [18:36:47] the "upload7" part ? [18:36:58] RECOVERY - DPKG on mw1212 is OK: All packages OK [18:37:42] paladox: You can just leave it alone :) [18:37:48] PROBLEM - DPKG on mw2205 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:37:48] PROBLEM - DPKG on mw1186 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:37:48] PROBLEM - DPKG on mw2239 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:37:49] PROBLEM - DPKG on mw2215 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:37:49] Oh [18:38:03] If it ain't broken, don't fix it ;-) [18:38:05] But it seems to cause problems like it wont save in the image folder [18:38:09] PROBLEM - DPKG on mw2124 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:38:31] stops icinga-wm for a second [18:38:37] PROBLEM - DPKG on mw1176 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:38:52] 06Operations, 10ops-eqiad, 06Discovery, 10Elasticsearch, 03Discovery-Search-Sprint: Improve balance of nodes across rows for elasticsearch cluster eqiad - https://phabricator.wikimedia.org/T143685#2576767 (10EBernhardson) [18:39:22] oh, puppet disabled on neon/icinga [18:39:29] PROBLEM - DPKG on mw1277 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:39:44] https://gerrit.wikimedia.org/r/#/c/258373/6 [18:39:45] ? [18:39:58] RECOVERY - DPKG on mw2205 is OK: All packages OK [18:39:58] RECOVERY - DPKG on mw1186 is OK: All packages OK [18:39:58] RECOVERY - DPKG on mw2239 is OK: All packages OK [18:39:59] RECOVERY - DPKG on mw2215 is OK: All packages OK [18:40:09] PROBLEM - DPKG on mw1175 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:40:18] RECOVERY - DPKG on mw2124 is OK: All packages OK [18:40:36] thcipriani: ostriches: so we need to identify what key is involved, and run mcc on commonwiki [18:40:47] i'm watching that on the web ui and will get the bot back once it calms down [18:42:04] (03PS1) 10BBlack: recdns: remove hydrogen from LVS nameservers_override [puppet] - 10https://gerrit.wikimedia.org/r/306262 [18:42:05] Dereckson: commonswiki:sidebar:$langcode [18:42:26] Ah, thanks. Trying that. [18:43:16] Skin.php : $cache->makeKey( 'sidebar', $this->getLanguage()->getCode() ) [18:43:16] :) [18:43:26] dereckson@terbium:~$ mwscript mcc.php --wiki commonswiki [18:43:29] > delete commonswiki:sidebar:fr [18:43:31] MemCached error [18:44:09] Yay for being verbose! [18:44:23] Could do it from eval ;-) [18:44:34] MessageCache::singleton()->delete( $key ) [18:45:50] (03CR) 10Dzahn: [C: 032] "yes please, we picked hydrogen first" [puppet] - 10https://gerrit.wikimedia.org/r/306262 (owner: 10BBlack) [18:48:02] There is a replace and a get methods, but not any more delete. [18:50:03] 06Operations, 13Patch-For-Review: Migrate hydrogen/chromium to jessie - https://phabricator.wikimedia.org/T123727#2576843 (10Dzahn) we picked hydrogen to start with. https://gerrit.wikimedia.org/r/#/c/306262/ removes it from /etc/resolv.conf on LVS servers after that we are going to depool it [18:51:02] hmmm I've an idea to force the cache to regenerate, remove a localised message page [18:52:23] Not a good one. [18:52:58] Deleted MediaWiki:Helppage/fr, but still got the content from previous deleted message, not MediaWiki:Helppage. [18:56:07] !log bblack@palladium conftool action : set/pooled=no; selector: dc=eqiad,cluster=dns,name=hydrogen.wikimedia.org [18:56:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:00:04] hashar: Respected human, time to deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160823T1900). Please do the needful. [19:00:20] lets unleash a new version [19:00:56] thcipriani: I'm offering to deploy the change, and leave a note on the task it will only be propagated once the sidebar / the message is purged. [19:01:22] Dereckson: sure that sounds good [19:01:32] hashar: hang on a second, still SWATting [19:01:32] there are restbase endpoint health issues [19:01:35] (and meanwhile find the exact key) [19:01:42] i am re-starting the icinga bot too [19:01:46] so they dont get unnoticed [19:01:52] (or how to get a more useful info than "MemCached error") [19:01:55] it was just off because of the unrelated other issues [19:02:16] Dereckson: uncommenting https://github.com/MegaBits/megabits_wiki/blob/master/maintenance/mcc.php#L142 [19:02:17] :P [19:02:31] oO [19:02:34] |log hashar syncing php-2.0 [19:02:48] lovely [19:02:51] no problems, take your time with SWAT [19:02:58] RECOVERY - restbase endpoints health on restbase-test2001 is OK: All endpoints are healthy [19:02:58] RECOVERY - mobileapps endpoints health on scb1002 is OK: All endpoints are healthy [19:03:15] I already got the code synced / updated. That is just going to be about switching group0 wikiversions.json [19:04:02] !log thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:306255|Fix UI l10n for Help page link on commons.wikimedia.org (T143564)]] (duration: 00m 47s) [19:04:03] T143564: MediaWiki:Helppage no longer works - https://phabricator.wikimedia.org/T143564 [19:04:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:04:07] ^ Dereckson sync'd [19:04:14] ack'ed [19:04:18] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [1000.0] [19:04:19] hashar are we go to 1.28 to 2.0? [19:04:33] !log rebooting hydrogen [19:04:33] paladox: that is a recurring joke :] [19:04:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:04:39] brion said now 2.0 [19:04:39] Oh, lol [19:04:41] no [19:04:48] PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [1000.0] [19:04:49] PROBLEM - Ulsfo HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0] [19:05:02] commit 595fa488da102e31324f180121cc36ffec9743ad [19:05:04] Date: Tue Jun 1 07:42:32 2004 +0000 [19:05:07] updated to work with new client [19:05:30] jouncebot: refresh [19:05:33] I refreshed my knowledge about deployments. [19:05:37] (the commit commenting the error reporting in mcc) [19:06:28] the text 5xx spike is RB? [19:06:48] hashar: SWAT is complete. [19:06:57] thcipriani: good job! [19:06:58] RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [19:06:58] RECOVERY - Ulsfo HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [19:07:06] just finished SWAT, 500 spike *looks* unrelated AFAICT [19:07:57] PROBLEM - Host 2620:0:861:1:7a2b:cbff:fe09:c21 is DOWN: /bin/ping6 -n -U -w 15 -c 5 2620:0:861:1:7a2b:cbff:fe09:c21 [19:08:09] might be network related [19:08:17] Filled https://phabricator.wikimedia.org/T143722 for mcc [19:08:19] no, that's hydrogen maint, the ping issue above [19:09:07] PROBLEM - Host 208.80.154.50 is DOWN: CRITICAL - Host Unreachable (208.80.154.50) [19:09:33] ACKNOWLEDGEMENT - Host 208.80.154.50 is DOWN: CRITICAL - Host Unreachable (208.80.154.50) daniel_zahn reinstall T123727 [19:09:33] ACKNOWLEDGEMENT - Host 2620:0:861:1:7a2b:cbff:fe09:c21 is DOWN: /bin/ping6 -n -U -w 15 -c 5 2620:0:861:1:7a2b:cbff:fe09:c21 daniel_zahn reinstall T123727 [19:09:33] ^ also hydrogen maintenance [19:09:52] going to scap so [19:09:58] thank you brandon [19:10:50] (03PS2) 10Hashar: Group0 to 1.28.0-wmf.16 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306192 (https://phabricator.wikimedia.org/T141551) [19:11:44] (03CR) 10Hashar: [C: 032] Group0 to 1.28.0-wmf.16 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306192 (https://phabricator.wikimedia.org/T141551) (owner: 10Hashar) [19:12:11] (03Merged) 10jenkins-bot: Group0 to 1.28.0-wmf.16 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306192 (https://phabricator.wikimedia.org/T141551) (owner: 10Hashar) [19:13:39] ahrh https://wikitech.wikimedia.org/wiki/Heterogeneous_deployment/Train_deploys#Update_deploy_notes [19:13:47] I never did that step ever :( [19:13:55] !log hashar@tin Synchronized wikiversions.json: Group0 to 1.28.0-wmf.16 T141551 (duration: 00m 48s) [19:13:56] T141551: MW-1.28.0-wmf.16 deployment blockers - https://phabricator.wikimedia.org/T141551 [19:14:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:15:01] oh my [19:15:26] https://www.mediawiki.org/wiki/Special:Version is still showing: 1.28.0-wmf.15 (e762a5c) [19:17:00] Hmm? [19:17:21] testwiki is fine though -checked earlier this afternoon and again right now- [19:17:32] !log hashar@tin Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 50s) [19:17:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:17:53] hashar@tin:/srv/mediawiki-staging$ grep mediawiki wikiversions.json [19:17:53] "mediawikiwiki": "php-1.28.0-wmf.16", [19:18:13] hashar: hmm, IIRC something changed in the new scap version WRT to wikiversions, looking now. [19:18:16] Yeah I see it. [19:18:46] we may just need to run scap wikiversions-compile manually for the time being...looking now. [19:18:53] what did I break? [19:18:58] test2 is also on wmf.15 [19:19:01] So yeah, seems likely [19:19:15] Wait, did you just sync .json? [19:19:18] but http://test.wikipedia.org/wiki/Special:Version is at wmf.16 [19:19:22] from the sync I did a few hours ago [19:19:40] $ scap --version [19:19:40] usage: scap [-h] ... [19:19:40] scap: error: too few arguments [19:19:43] :D [19:19:44] Scap would've done it then [19:19:48] The recompile. [19:19:48] !log demon@tin rebuilt wikiversions.php and synchronized wikiversions files: fix compile & sync of wikiversions [19:19:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:19:57] {{fixed}} [19:19:59] oh wikiversions.php [19:20:07] but I just did scap sync-file wikiversions.json [19:20:15] seriously [19:20:18] That's not what the guide says to do ;-) [19:20:23] oh. yeah that's not going to work [19:20:25] I haven't read the doc: [19:20:26] ync the change across the cluster [19:20:26] scap sync-wikiversions "group0 to VERSION" [19:20:40] so that is an issue between chair and keyboard [19:20:48] sorry :( [19:20:55] No worries :) [19:21:03] * bd808 teaches hashar new tricks ;) [19:21:07] thank you all [19:21:13] ah, ok, good, I didn't see anything digging through the code :) [19:21:19] * thcipriani unpanics [19:21:24] * hashar teaches scap to invoke sync-wikiversions when syncing wikiversions.json [19:21:25] :D [19:21:59] * ostriches removes sync-file instead :p [19:22:02] the new hotness with sync-wikiversions is that it should be updating the co-masters [19:22:30] does anyone has the "update deploy notes" already set up? I feel lazy setting it up ( step is https://wikitech.wikimedia.org/wiki/Heterogeneous_deployment/Train_deploys#Update_deploy_notes ) [19:22:46] bd808: \o/ thank you for fixing that [19:23:00] hashar: you could be the first person smart enough to set that up as jenkins job ;) [19:23:21] Heh, fun part is I usually get beat to posting the [19:23:23] *them [19:23:28] ^ [19:24:03] R.eedy was very good about doing that step. most others not so much [19:24:16] Florian usually beats me to it ;-) [19:24:18] bd808: oh I am no more dealing with Jenkins and CI [19:24:31] he would even update the past release with the changes that happened via swat [19:24:32] what's broken? [19:24:33] others have outsmarted me already [19:24:39] audephone: Nothing :) [19:24:46] Oh [19:24:52] audephone: we did 1.28.0-wmf.16 to testwiki fatalmonitor is all fine apparently [19:24:59] Thought the train was reverted [19:25:19] it is all fine audephone :] [19:25:39] :) [19:25:53] bd808: but yeah you are right, we can probably harness a good part of the deployment train in Jenkins. We had such discussions but there is a bunch of prerequisites to it :D [19:26:14] Train conducting is gonna change soon-ish anyway. Exciting things a-coming :) [19:26:16] https://www.mediawiki.org/wiki/MediaWiki_1.28/Roadmap updated! [19:26:22] hashar: the release notes part would be trivial I think [19:26:24] So I wouldn't spent a ton of effort into automating the existing madness! [19:26:34] dsh -m mediawiki npm install mediawiki@1.28.0-wmf.16 [19:26:53] yeah [19:26:55] gross [19:27:13] or one can just let puppet ensure => latest the mediawiki.deb [19:27:16] Need to refine my sms train notifications to say what was deployed [19:27:24] I guess I'll have more time to work on other projects if MW is migrated to node :) [19:27:33] bd808: +1000 [19:27:49] audephone: use https://tools.wmflabs.org/versions/ as your browser home page and never get lost ! [19:28:04] :) [19:28:13] Or we could just let everyone edit the code directly on the servers and go back to serving from nfs. [19:28:21] INSTANT DEPLOY AS SOON AS YOU PRESS SAVE! [19:28:36] ostriches: only if we add a web editor [19:28:53] lets just put all the code in a new namespace on metawiki [19:29:15] MediaWiki? Oh wait.... [19:29:35] who needs git? wiki pages have history logs and diffs and blame and reverts too [19:29:43] NS_MADNESS ;-) [19:29:48] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0] [19:30:18] PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] [19:30:52] (03PS1) 10Dzahn: installserver: remove duplicate hydrogen entry [puppet] - 10https://gerrit.wikimedia.org/r/306272 [19:30:57] * bd808 wanders back to figuring out how to send uwsgi logs events to logstash [19:31:32] Oh yeah we have log4j now. [19:31:38] seems like 500's are mildly-elevated since whatever synced out at 13:15 UTC today, but hard to see against the backdrop of the most-recent spike [19:31:40] I should work up some .properties to use that [19:31:44] (03CR) 10Dzahn: [C: 031] "using the one where chromium and hydrogen are the same" [puppet] - 10https://gerrit.wikimedia.org/r/306272 (owner: 10Dzahn) [19:34:27] ostriches: ah yeah log4j is there any special trick to send its output to logstash ? We could use that for Jenkins as well [19:34:52] !log 1.28.0-wmf.16 to group0 looks successful. [19:34:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:36:08] hashar: Just need to configure Jenkins' log4j.properties file [19:37:08] (03CR) 10Dzahn: [C: 032] installserver: remove duplicate hydrogen entry [puppet] - 10https://gerrit.wikimedia.org/r/306272 (owner: 10Dzahn) [19:37:13] (03PS2) 10Dzahn: installserver: remove duplicate hydrogen entry [puppet] - 10https://gerrit.wikimedia.org/r/306272 [19:37:29] hashar: https://blog.lanyonm.org/articles/2015/12/29/log-aggregation-log4j-spring-logstash.html#log4j-over-tcp is what we setup [19:37:39] log4j socketappender [19:38:48] RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [19:38:48] PROBLEM - Ulsfo HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0] [19:39:17] RECOVERY - Host 2620:0:861:1:7a2b:cbff:fe09:c21 is UP: PING OK - Packet loss = 0%, RTA = 2.29 ms [19:39:43] Went with that since it exists in the standard log4j libraries and doesn't require an extra package (to go the json route for example) [19:40:37] RECOVERY - Host 208.80.154.50 is UP: PING OK - Packet loss = 0%, RTA = 1.13 ms [19:42:28] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [1000.0] [19:42:52] 06Operations, 06Commons, 10Wikimedia-SVG-rendering, 10media-storage: Install mscorefonts on scaling servers for SVG rendering - https://phabricator.wikimedia.org/T140141#2577161 (10kaldari) [19:43:49] PROBLEM - Recursive DNS on 2620:0:861:1:7a2b:cbff:fe09:c21 is CRITICAL: CRITICAL - Plugin timed out while executing system call [19:44:05] ^ it's installing right now [19:44:34] ACKNOWLEDGEMENT - Recursive DNS on 208.80.154.50 is CRITICAL: CRITICAL - Plugin timed out while executing system call daniel_zahn reinstall ongoing [19:44:34] ACKNOWLEDGEMENT - Recursive DNS on 2620:0:861:1:7a2b:cbff:fe09:c21 is CRITICAL: CRITICAL - Plugin timed out while executing system call daniel_zahn reinstall ongoing [19:45:17] ostriches: neat [19:47:33] filled https://phabricator.wikimedia.org/T143733 :D [19:48:27] (03PS1) 10Dzahn: Revert "recdns: remove hydrogen from LVS nameservers_override" [puppet] - 10https://gerrit.wikimedia.org/r/306275 [19:49:57] 06Operations, 06Commons, 10Wikimedia-SVG-rendering, 10media-storage: Install mscorefonts on scaling servers for SVG rendering - https://phabricator.wikimedia.org/T140141#2577186 (10kaldari) >we could exclude Times, Arial and Courier (since they are in fact covered by the Liberation fonts which we prefer fo... [19:55:26] !log re-signing new puppet certs for hydrogen, initial run, new salt key [19:55:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:55:48] PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [1000.0] [19:57:06] 06Operations, 13Patch-For-Review: Migrate hydrogen/chromium to jessie - https://phabricator.wikimedia.org/T123727#2577212 (10Dzahn) hydrogen was in netboot.cfg twice with different partman recipe https://gerrit.wikimedia.org/r/#/c/306272/ had to racreset to see console output after reboot, booted into PXE, r... [20:00:09] RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [20:00:09] RECOVERY - Ulsfo HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [20:00:29] PROBLEM - Host 2620:0:861:1:7a2b:cbff:fe09:c21 is DOWN: /bin/ping6 -n -U -w 15 -c 5 2620:0:861:1:7a2b:cbff:fe09:c21 [20:07:32] (03PS2) 10MaxSem: WIP: discovery stats module [puppet] - 10https://gerrit.wikimedia.org/r/305673 (https://phabricator.wikimedia.org/T143048) [20:08:43] (03CR) 10jenkins-bot: [V: 04-1] WIP: discovery stats module [puppet] - 10https://gerrit.wikimedia.org/r/305673 (https://phabricator.wikimedia.org/T143048) (owner: 10MaxSem) [20:10:34] (03PS3) 10MaxSem: WIP: discovery stats module [puppet] - 10https://gerrit.wikimedia.org/r/305673 (https://phabricator.wikimedia.org/T143048) [20:11:46] (03CR) 10jenkins-bot: [V: 04-1] WIP: discovery stats module [puppet] - 10https://gerrit.wikimedia.org/r/305673 (https://phabricator.wikimedia.org/T143048) (owner: 10MaxSem) [20:18:19] (03CR) 10Dzahn: [C: 032] installserver: let hydrogen use raid1-1partition partman [puppet] - 10https://gerrit.wikimedia.org/r/306278 (owner: 10Dzahn) [20:18:24] (03PS2) 10Dzahn: installserver: let hydrogen use raid1-1partition partman [puppet] - 10https://gerrit.wikimedia.org/r/306278 [20:18:48] (03CR) 10Dzahn: [V: 032] installserver: let hydrogen use raid1-1partition partman [puppet] - 10https://gerrit.wikimedia.org/r/306278 (owner: 10Dzahn) [20:19:57] (03PS21) 10BryanDavis: Provision Striker via scap3 [puppet] - 10https://gerrit.wikimedia.org/r/301505 (https://phabricator.wikimedia.org/T141014) [20:21:17] (03CR) 10jenkins-bot: [V: 04-1] Provision Striker via scap3 [puppet] - 10https://gerrit.wikimedia.org/r/301505 (https://phabricator.wikimedia.org/T141014) (owner: 10BryanDavis) [20:22:07] !log hydrogen - reinstalling one more time, wrong partitioning [20:22:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:22:19] (03PS22) 10BryanDavis: Provision Striker via scap3 [puppet] - 10https://gerrit.wikimedia.org/r/301505 (https://phabricator.wikimedia.org/T141014) [20:23:27] (03CR) 10jenkins-bot: [V: 04-1] Provision Striker via scap3 [puppet] - 10https://gerrit.wikimedia.org/r/301505 (https://phabricator.wikimedia.org/T141014) (owner: 10BryanDavis) [20:23:31] (03CR) 10Dzahn: [C: 032] Scap3: Go ahead and `scap deploy --init` a freshly provisioned repo [puppet] - 10https://gerrit.wikimedia.org/r/301409 (owner: 10Chad) [20:23:37] (03PS5) 10Dzahn: Scap3: Go ahead and `scap deploy --init` a freshly provisioned repo [puppet] - 10https://gerrit.wikimedia.org/r/301409 (owner: 10Chad) [20:23:55] (03CR) 10Dzahn: "was to be on puppet swat earlier" [puppet] - 10https://gerrit.wikimedia.org/r/301409 (owner: 10Chad) [20:24:37] (03PS23) 10BryanDavis: Provision Striker via scap3 [puppet] - 10https://gerrit.wikimedia.org/r/301505 (https://phabricator.wikimedia.org/T141014) [20:25:47] RECOVERY - Host 2620:0:861:1:7a2b:cbff:fe09:c21 is UP: PING OK - Packet loss = 0%, RTA = 1.26 ms [20:27:28] PROBLEM - Recursive DNS on 208.80.154.50 is CRITICAL: CRITICAL - Plugin timed out while executing system call [20:28:20] ACKNOWLEDGEMENT - Recursive DNS on 208.80.154.50 is CRITICAL: CRITICAL - Plugin timed out while executing system call daniel_zahn reinstall [20:33:45] (03PS1) 10Madhuvishy: labstore: Change nfs mount removal logic to not declaring it as file resource [puppet] - 10https://gerrit.wikimedia.org/r/306280 [20:34:57] (03CR) 10jenkins-bot: [V: 04-1] labstore: Change nfs mount removal logic to not declaring it as file resource [puppet] - 10https://gerrit.wikimedia.org/r/306280 (owner: 10Madhuvishy) [20:35:45] 06Operations, 10Domains, 10Traffic, 10Wikimedia-Site-requests, 13Patch-For-Review: Private wiki for Project Grants Committee - https://phabricator.wikimedia.org/T143138#2577334 (10Dzahn) 05Open>03stalled p:05Triage>03Normal [20:41:06] 06Operations, 10ops-codfw, 10DBA: es2004 has a dead disk, but it is not under warranty - https://phabricator.wikimedia.org/T143220#2577372 (10Papaul) @jcrespo the Raid controller is showing that it is saying the disk what you need to do is to put the new disk in the Raid10 see image below. {F4394285} {F4... [20:41:54] (03PS1) 10Andrew Bogott: Revert "labs: Depool labvirt1011" [puppet] - 10https://gerrit.wikimedia.org/r/306282 [20:41:57] PROBLEM - puppet last run on tin is CRITICAL: CRITICAL: puppet fail [20:42:27] (03CR) 10BryanDavis: [C: 04-1] "Need to roll back the logstash logging or find another encoding because jessie doesn't include the msgpack formatter." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/301505 (https://phabricator.wikimedia.org/T141014) (owner: 10BryanDavis) [20:42:39] !Log hydrogen - signing new puppet cert [20:42:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:43:57] 06Operations, 07Puppet, 06Release-Engineering-Team: Preload TestingAccessWrapper in production mwrepl - https://phabricator.wikimedia.org/T143607#2577379 (10Mattflaschen-WMF) [20:44:06] (03CR) 10Andrew Bogott: [C: 032] Revert "labs: Depool labvirt1011" [puppet] - 10https://gerrit.wikimedia.org/r/306282 (owner: 10Andrew Bogott) [20:44:09] (03CR) 10Dzahn: "Error: Failed to apply catalog: Validation of Exec[scap::source init analytics/refinery for analytics/refinery] failed: 'scap deploy --ini" [puppet] - 10https://gerrit.wikimedia.org/r/301409 (owner: 10Chad) [20:44:17] ostriches: on tin. ^ [20:44:45] (03PS2) 10Madhuvishy: labstore: Change nfs mount removal logic to not declaring it as file resource [puppet] - 10https://gerrit.wikimedia.org/r/306280 [20:44:49] PROBLEM - puppet last run on mw1232 is CRITICAL: CRITICAL: Puppet has 1 failures [20:45:03] (03PS1) 10Dzahn: Revert "Scap3: Go ahead and `scap deploy --init` a freshly provisioned repo" [puppet] - 10https://gerrit.wikimedia.org/r/306283 [20:46:58] RECOVERY - puppet last run on mw1232 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [20:48:08] (03CR) 10Dzahn: [C: 032] "Error: Failed to apply catalog: Validation of Exec[scap::source init analytics/refinery for analytics/refinery] failed: 'scap deploy --ini" [puppet] - 10https://gerrit.wikimedia.org/r/306283 (owner: 10Dzahn) [20:48:23] (03PS2) 10Dzahn: Revert "Scap3: Go ahead and `scap deploy --init` a freshly provisioned repo" [puppet] - 10https://gerrit.wikimedia.org/r/306283 [20:51:49] (03PS24) 10BryanDavis: Provision Striker via scap3 [puppet] - 10https://gerrit.wikimedia.org/r/301505 (https://phabricator.wikimedia.org/T141014) [20:52:09] Platonides: again [20:52:13] I see [20:52:15] :) [20:52:29] RECOVERY - puppet last run on tin is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [20:52:42] ^ that's why the revert above [20:52:56] (03CR) 10jenkins-bot: [V: 04-1] Provision Striker via scap3 [puppet] - 10https://gerrit.wikimedia.org/r/301505 (https://phabricator.wikimedia.org/T141014) (owner: 10BryanDavis) [20:53:27] PROBLEM - Host 2620:0:861:1:7a2b:cbff:fe09:c21 is DOWN: /bin/ping6 -n -U -w 15 -c 5 2620:0:861:1:7a2b:cbff:fe09:c21 [20:54:08] ^ was added a second ago [20:54:09] PROBLEM - puppet last run on mira is CRITICAL: CRITICAL: puppet fail [20:54:14] ^ same as tin [20:54:43] ACKNOWLEDGEMENT - Host 2620:0:861:1:7a2b:cbff:fe09:c21 is DOWN: /bin/ping6 -n -U -w 15 -c 5 2620:0:861:1:7a2b:cbff:fe09:c21 daniel_zahn puppet just added it [20:57:18] RECOVERY - Recursive DNS on 208.80.154.50 is OK: DNS OK: 0.110 seconds response time. www.wikipedia.org returns 208.80.154.224 [20:57:43] (03PS25) 10BryanDavis: Provision Striker via scap3 [puppet] - 10https://gerrit.wikimedia.org/r/301505 (https://phabricator.wikimedia.org/T141014) [20:58:27] PROBLEM - puppet last run on elastic2006 is CRITICAL: CRITICAL: puppet fail [20:58:28] RECOVERY - puppet last run on mira is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [20:59:09] (03PS1) 10Papaul: DHCP: Add MAC entris for wdqs200[12] Bug:T142864 [puppet] - 10https://gerrit.wikimedia.org/r/306285 (https://phabricator.wikimedia.org/T142864) [21:00:17] 06Operations, 10ops-codfw, 06Discovery: codfw: rack/setup/deploy wdqs200[12]switch configuration - https://phabricator.wikimedia.org/T143613#2577438 (10Papaul) @akosiaris Thank you [21:02:05] (03PS2) 10Papaul: DHCP: Add MAC entries for wdqs200[12] Bug:T142864 [puppet] - 10https://gerrit.wikimedia.org/r/306285 (https://phabricator.wikimedia.org/T142864) [21:02:09] (03PS3) 10Dzahn: DHCP: Add MAC entries for wdqs200[12] Bug:T142864 [puppet] - 10https://gerrit.wikimedia.org/r/306285 (https://phabricator.wikimedia.org/T142864) (owner: 10Papaul) [21:05:18] PROBLEM - NTP peers on hydrogen is CRITICAL: NTP CRITICAL: Server not synchronized, Offset unknown [21:05:41] (03PS1) 10Andrew Bogott: The labs puppet backend now requires python3-yaml. [puppet] - 10https://gerrit.wikimedia.org/r/306289 [21:05:53] !log hydrogen - reinstall finished, re-added to salt, restarted ntpd [21:05:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:08:31] (03PS26) 10BryanDavis: Provision Striker via scap3 [puppet] - 10https://gerrit.wikimedia.org/r/301505 (https://phabricator.wikimedia.org/T141014) [21:09:38] RECOVERY - NTP peers on hydrogen is OK: NTP OK: Offset 0.047835 secs [21:09:50] ^ confirmed in sync with chromium [21:11:42] (03CR) 10Dzahn: [C: 032] DHCP: Add MAC entries for wdqs200[12] Bug:T142864 [puppet] - 10https://gerrit.wikimedia.org/r/306285 (https://phabricator.wikimedia.org/T142864) (owner: 10Papaul) [21:12:32] (03PS2) 10Dzahn: Revert "recdns: remove hydrogen from LVS nameservers_override" [puppet] - 10https://gerrit.wikimedia.org/r/306275 [21:14:42] (03CR) 10jenkins-bot: [V: 04-1] Provision Striker via scap3 [puppet] - 10https://gerrit.wikimedia.org/r/301505 (https://phabricator.wikimedia.org/T141014) (owner: 10BryanDavis) [21:19:03] (03PS27) 10BryanDavis: Provision Striker via scap3 [puppet] - 10https://gerrit.wikimedia.org/r/301505 (https://phabricator.wikimedia.org/T141014) [21:24:03] mutante: Ugh that sucks considering I gave it a cwd [21:24:06] I'll revisit later. [21:24:36] ostriches: ok, yep. thanks [21:25:37] * mutante just wants to get hydrogen back in service [21:25:58] RECOVERY - puppet last run on elastic2006 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:27:02] !log dzahn@palladium conftool action : set/pooled=yes; selector: dc=eqiad,cluster=dns,name=hydrogen.wikimedia.org [21:27:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:27:46] !log redeploying WDQS GUI to fix examples breakage [21:27:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:28:13] aha [21:28:33] 06Operations, 10Cassandra: Address abnormally wide partitions - https://phabricator.wikimedia.org/T143056#2577510 (10Eevans) [21:30:37] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 80.00% of data above the critical threshold [50.0] [21:31:17] 06Operations, 10Cassandra: Address abnormally wide partitions - https://phabricator.wikimedia.org/T143056#2577511 (10Eevans) p:05Triage>03Normal [21:32:46] ^^^ looks redis related? https://logstash.wikimedia.org/goto/0f7847f7060a6871b997852d98f52811 [21:33:17] (03PS1) 10Papaul: Adding install params for wdqs200[1-2] T142864 [puppet] - 10https://gerrit.wikimedia.org/r/306290 [21:33:20] no [21:34:20] "The serialization "Q0" is not recognized by the configured id builders [21:34:23] " [21:34:30] https://logstash.wikimedia.org/goto/b9f3090444914f459784fb4922cac74c [21:35:32] greg-g: since that is wikidata i guess it's related to [21:35:34] < SMalyshev> !log redeploying WDQS GUI to fix examples breakage [21:36:02] SMalyshev: ^^ is that a valid assumption? [21:36:12] I actually missed the !log from SMalyshev :) [21:36:15] the "Rate limit exceeded" on restbase looks wrong but it's "just" a WARN [21:36:26] reported https://phabricator.wikimedia.org/T143744 [21:36:50] (03PS1) 10ArielGlenn: scheduler: move command retry updating into its own method [dumps] - 10https://gerrit.wikimedia.org/r/306291 [21:39:07] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [21:39:14] ok, good [21:39:36] @icinga [21:40:08] yeah, it went down/is gone, maybe [21:40:43] 06Operations, 10Ops-Access-Requests: Access to people.wikimedia.org for Volker_E - https://phabricator.wikimedia.org/T143465#2577532 (10Dzahn) >>! In T143465#2574323, @Legoktm wrote: > If it's urgent, requesting a shell account that requiring waiting for ops approval seems like the wrong approach to me :) Act... [21:41:59] SMalyshev: hola? [21:42:27] nope, not gone, it might alert again [21:43:10] (03PS2) 10ArielGlenn: scheduler: move command retry updating into its own method [dumps] - 10https://gerrit.wikimedia.org/r/306291 [21:44:09] (03CR) 10ArielGlenn: [C: 032] scheduler: move command retry updating into its own method [dumps] - 10https://gerrit.wikimedia.org/r/306291 (owner: 10ArielGlenn) [21:44:51] (03CR) 10Dzahn: [C: 032] "hydrogen is back and answers to "dig @hydrogen.wikimedia.org .. " requests" [puppet] - 10https://gerrit.wikimedia.org/r/306275 (owner: 10Dzahn) [21:45:27] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [50.0] [21:46:25] same thing ^ [21:48:30] greg-g: sorry, not sure I got it - what is the problem there? [21:48:37] https://phabricator.wikimedia.org/T143744 [21:49:12] something is causing an error to occur far too frequently in wikibase [21:49:14] greg-g: looks like wikidata problem, Im 99.9% sure not aused by deploying wdqs gui [21:49:21] *caused [21:49:48] started a bit after 21:20 [21:50:12] so yeah, not you [21:50:14] hmm... is is possible to link it to a request? [21:50:44] Q0 is not a valid ID indeed but I wonder where it could be from [21:52:23] looks like API POST... maybe some bot is just trying to post garbage? [21:52:57] maybe? [21:54:38] it looks like somebody is editing the entity and putting Q0 there which is a bad ID. But I have no idea who's doing it [21:55:58] this is going to be annoying if it keeps alerting [21:56:25] 06Operations, 13Patch-For-Review: Migrate hydrogen/chromium to jessie - https://phabricator.wikimedia.org/T123727#2577586 (10Dzahn) {P3880} [21:58:50] well it looks like user error, I'm not sure whether those should end up in logstash... but if yes then to stop it we'd need to find the bot doing it I suppose [21:59:58] not sure which kind of message is this - is this uncaught exception? Then it looks like exception handling is missing somewhere in Wikidata... [22:00:04] bd808: Dear anthropoid, the time has come. Please deploy Deploy Logstash plugins (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160823T2200). [22:00:22] ohi jouncebot [22:00:52] this should be quick. I just took a deploy window incase something goes horribly wrong [22:01:07] 06Operations, 13Patch-For-Review: Migrate hydrogen/chromium to jessie - https://phabricator.wikimedia.org/T123727#2577597 (10Dzahn) -- 21:05 mutante: hydrogen - reinstall finished, re-added to salt, restarted ntpd 20:42 mutante: hydrogen - signing new puppet cert 20:22 mutante: hydrogen - reinstalling one more... [22:03:32] (03PS1) 10Rush: diamond: sge stats run from the services hosts [puppet] - 10https://gerrit.wikimedia.org/r/306294 [22:04:13] !log Updated tin:/srv/deployment/logstash/plugins to d18b1c6 (Add output plugin for Sentry) [22:04:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [22:04:24] all done [22:04:36] (03PS2) 10Rush: diamond: sge stats run from the services hosts [puppet] - 10https://gerrit.wikimedia.org/r/306294 (https://phabricator.wikimedia.org/T140999) [22:05:17] SMalyshev: yeah, my guess is bad error handling [22:05:59] greg-g: looks like this exception is not getting caught because it's in different hierachy than ones that are getting caught... but I don't know enough of that code to say if it should be or not [22:06:08] * greg-g nods [22:06:24] (03CR) 10Rush: [C: 032] diamond: sge stats run from the services hosts [puppet] - 10https://gerrit.wikimedia.org/r/306294 (https://phabricator.wikimedia.org/T140999) (owner: 10Rush) [22:06:47] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [22:07:03] somebody from wikidata team - like Daniel or aude or somebody else knowing the guts of it will need to look into it [22:10:20] 06Operations, 13Patch-For-Review: Migrate hydrogen/chromium to jessie - https://phabricator.wikimedia.org/T123727#2577608 (10Dzahn) NTP in sync with chromium: root@hydrogen:~# ntpdc -c peers | grep chrom +chromium.wikime 2620:0:861:1:20 3 128 377 0.00009 -0.012633 0.08954 and Icinga: NTP OK: Offset -0.006... [22:10:41] 06Operations: Migrate hydrogen/chromium to jessie - https://phabricator.wikimedia.org/T123727#2577609 (10Dzahn) [22:10:52] 06Operations: Migrate hydrogen/chromium to jessie - https://phabricator.wikimedia.org/T123727#1936549 (10Dzahn) a:03Dzahn [22:14:21] 06Operations, 05Goal: reduce amount of remaining Ubuntu 12.04 (precise) systems - https://phabricator.wikimedia.org/T123525#2577648 (10Dzahn) [22:33:11] (03PS28) 10BryanDavis: Provision Striker via scap3 [puppet] - 10https://gerrit.wikimedia.org/r/301505 (https://phabricator.wikimedia.org/T141014) [22:34:20] (03CR) 10jenkins-bot: [V: 04-1] Provision Striker via scap3 [puppet] - 10https://gerrit.wikimedia.org/r/301505 (https://phabricator.wikimedia.org/T141014) (owner: 10BryanDavis) [22:37:09] (03PS29) 10BryanDavis: Provision Striker via scap3 [puppet] - 10https://gerrit.wikimedia.org/r/301505 (https://phabricator.wikimedia.org/T141014) [22:38:14] (03CR) 10jenkins-bot: [V: 04-1] Provision Striker via scap3 [puppet] - 10https://gerrit.wikimedia.org/r/301505 (https://phabricator.wikimedia.org/T141014) (owner: 10BryanDavis) [22:40:22] (03PS30) 10BryanDavis: Provision Striker via scap3 [puppet] - 10https://gerrit.wikimedia.org/r/301505 (https://phabricator.wikimedia.org/T141014) [22:44:24] 06Operations, 05Goal: reduce amount of remaining Ubuntu 12.04 (precise) systems - https://phabricator.wikimedia.org/T123525#2577786 (10Dzahn) hydrogen (DNS) upgraded to jessie. count down to 13 [22:45:27] 06Operations, 10Ops-Access-Requests: Access to people.wikimedia.org for Volker_E - https://phabricator.wikimedia.org/T143465#2577787 (10Dzahn) p:05Triage>03Normal [22:47:46] 06Operations, 13Patch-For-Review: Audit/fix hosts with no RAID configured - https://phabricator.wikimedia.org/T136562#2577795 (10Dzahn) [hydrogen:~] $ cat /proc/mdstat | grep active md1 : active (auto-read-only) raid1 sda2[0] sdb2[1] md0 : active raid1 sda1[0] sdb1[1] [22:48:26] (03PS1) 10BryanDavis: Revert "logstash: new input for msgpack over UDP" [puppet] - 10https://gerrit.wikimedia.org/r/306299 [22:49:08] (03CR) 10BryanDavis: "I should have marked my original patch with a -1 while I was working on testing the whole pipeline, but I didn't know that Gehel was going" [puppet] - 10https://gerrit.wikimedia.org/r/306299 (owner: 10BryanDavis) [22:50:18] (03PS2) 10Dzahn: Adding install params for wdqs200[1-2] T142864 [puppet] - 10https://gerrit.wikimedia.org/r/306290 (owner: 10Papaul) [22:52:13] (03CR) 10Dzahn: [C: 032] Adding install params for wdqs200[1-2] T142864 [puppet] - 10https://gerrit.wikimedia.org/r/306290 (owner: 10Papaul) [22:53:41] (03CR) 10Dzahn: "ping" [puppet] - 10https://gerrit.wikimedia.org/r/301849 (owner: 10Paladox) [22:54:23] (03CR) 10Dzahn: "re-add me when you think it's ready" [puppet] - 10https://gerrit.wikimedia.org/r/301849 (owner: 10Paladox) [22:54:40] (03CR) 10BryanDavis: "Added akosiaris as reviewer because I'm touching his service::uwsgi class with a change that will affect ORES." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/301505 (https://phabricator.wikimedia.org/T141014) (owner: 10BryanDavis) [22:57:23] heh, gerrit, what i want is vote and then remove myself :) [23:00:04] RoanKattouw, ostriches, MaxSem, and Dereckson: Respected human, time to deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160823T2300). Please do the needful. [23:00:35] There is currently nothing on the calendar. I'll have something to deploy in a few minutes for T143480. [23:00:36] T143480: Enable Related pages on french Wikinews (opt-in) - https://phabricator.wikimedia.org/T143480 [23:01:33] (03CR) 10Dzahn: "i would have expected this to be a noop on lead, or at least the template would stay the same, but http://puppet-compiler.wmflabs.org/3811" [puppet] - 10https://gerrit.wikimedia.org/r/303355 (owner: 10Paladox) [23:06:48] James_F: there is a request from fr.wikinews to enable RelatedArticles. fr.wikinews has a tech sawwy community, and good for reporting. Would it be possible to you? [23:08:00] (03PS1) 10Ppchelko: Change-Prop: Removed unused config properties [puppet] - 10https://gerrit.wikimedia.org/r/306300 [23:08:29] /15/12 [23:09:13] (03CR) 10Dzahn: "eh, nevermind about the template change, ok" [puppet] - 10https://gerrit.wikimedia.org/r/303355 (owner: 10Paladox) [23:10:54] (03CR) 10Ppchelko: "https://puppet-compiler.wmflabs.org/3812/" [puppet] - 10https://gerrit.wikimedia.org/r/306300 (owner: 10Ppchelko) [23:16:48] (03PS1) 10Dereckson: Enable Related Articles on fr.wikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306301 (https://phabricator.wikimedia.org/T143480) [23:17:42] greg-g: for a https://www.mediawiki.org/wiki/Extension:RelatedArticles deployement, enabled by default on wikivoyage and wikipedia as a beta feature, once the community agrees, do we need an approval from the reading team too? [23:17:51] (03PS6) 10Dzahn: Gerrit: Minor config tidying to avoid puppet/init inconsistencies [puppet] - 10https://gerrit.wikimedia.org/r/304977 (owner: 10Paladox) [23:18:08] Dereckson: That's a Reading question, not Editing, sorry. [23:18:21] James_F: yes, just figured that [23:18:28] Thanks. [23:18:30] Oh, yes, sorry. [23:19:58] (03CR) 10Dzahn: [C: 032] Gerrit: Minor config tidying to avoid puppet/init inconsistencies [puppet] - 10https://gerrit.wikimedia.org/r/304977 (owner: 10Paladox) [23:20:10] (03PS10) 10Dzahn: Gerrit: make auth_type configurable for labs [puppet] - 10https://gerrit.wikimedia.org/r/303355 (owner: 10Paladox) [23:20:57] (03CR) 10BBlack: [C: 031] WDQS caching headers [puppet] - 10https://gerrit.wikimedia.org/r/306163 (https://phabricator.wikimedia.org/T137238) (owner: 10Gehel) [23:21:41] jdlrobson: ping? [23:22:32] (03CR) 10Dzahn: [C: 032] Gerrit: make auth_type configurable for labs [puppet] - 10https://gerrit.wikimedia.org/r/303355 (owner: 10Paladox) [23:23:21] !log gerrit restarting for config changes 303355, 304977 [23:23:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:25:13] hey Dereckson [23:25:41] Dereckson: find with me [23:25:44] Hello jdlrobson. We've got a request from fr.wikinews to get the Related Articles extension. [23:26:28] and seems fine with community [23:26:33] so go for it [23:26:36] okay [23:28:23] !log restarted grrrit-wm, apache and gerrit on lead [23:28:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:29:03] (03PS2) 10Dzahn: Revert "Gerrit: Minor config tidying to avoid puppet/init inconsistencies" [puppet] - 10https://gerrit.wikimedia.org/r/306302 [23:29:13] (03CR) 10Dereckson: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306301 (https://phabricator.wikimedia.org/T143480) (owner: 10Dereckson) [23:29:36] 50% fail rate, hrmm [23:29:40] (03Merged) 10jenkins-bot: Enable Related Articles on fr.wikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306301 (https://phabricator.wikimedia.org/T143480) (owner: 10Dereckson) [23:31:08] (03CR) 10Dzahn: "the minor tiyding broke the service, more testing is needed even for minor things" [puppet] - 10https://gerrit.wikimedia.org/r/306302 (owner: 10Dzahn) [23:31:58] PROBLEM - puppet last run on kafka1002 is CRITICAL: CRITICAL: Puppet has 1 failures [23:32:29] (03PS1) 10Jforrester: On public wikis, show "Publish" rather than "Save" on edit pages [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306303 (https://phabricator.wikimedia.org/T131132) [23:33:44] (03PS1) 10Dzahn: Revert "Gerrit: make auth_type configurable for labs" [puppet] - 10https://gerrit.wikimedia.org/r/306304 [23:34:24] (03CR) 10Dzahn: [C: 032] "sigh" [puppet] - 10https://gerrit.wikimedia.org/r/306304 (owner: 10Dzahn) [23:34:35] (03PS2) 10Dzahn: Revert "Gerrit: make auth_type configurable for labs" [puppet] - 10https://gerrit.wikimedia.org/r/306304 [23:34:51] Filled https://phabricator.wikimedia.org/T143756 from fatalmonitor, reported on #wikidata the issue. [23:35:16] s/50/100 [23:35:46] 306301 live on mw1099 [23:36:09] (03PS1) 10BBlack: zerofetch: send params in POST data on POST [puppet] - 10https://gerrit.wikimedia.org/r/306305 (https://phabricator.wikimedia.org/T143285) [23:36:28] (03CR) 10Dzahn: "Internal server error" [puppet] - 10https://gerrit.wikimedia.org/r/306304 (owner: 10Dzahn) [23:37:39] (03PS2) 10BBlack: zerofetch: send params in POST data on POST [puppet] - 10https://gerrit.wikimedia.org/r/306305 (https://phabricator.wikimedia.org/T143285) [23:37:49] (03CR) 10BBlack: [C: 032 V: 032] zerofetch: send params in POST data on POST [puppet] - 10https://gerrit.wikimedia.org/r/306305 (https://phabricator.wikimedia.org/T143285) (owner: 10BBlack) [23:38:11] (03CR) 10Dzahn: "please abandon" [puppet] - 10https://gerrit.wikimedia.org/r/303435 (https://phabricator.wikimedia.org/T141803) (owner: 10Paladox) [23:40:59] Works fine. [23:45:26] !log dereckson@tin Synchronized wmf-config/InitialiseSettings.php: Enable Related Articles on fr.wikinews (T143480) (duration: 00m 53s) [23:45:27] T143480: Enable Related pages on french Wikinews (opt-in) - https://phabricator.wikimedia.org/T143480 [23:45:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:45:46] SWAT done. [23:49:05] (03CR) 10Mobrovac: [C: 031] Change-Prop: Removed unused config properties [puppet] - 10https://gerrit.wikimedia.org/r/306300 (owner: 10Ppchelko) [23:55:27] (03Abandoned) 10Paladox: Gerrit: Support labs https [puppet] - 10https://gerrit.wikimedia.org/r/303435 (https://phabricator.wikimedia.org/T141803) (owner: 10Paladox) [23:55:50] (03PS2) 10BBlack: Remove geoiplookup DNS entries [dns] - 10https://gerrit.wikimedia.org/r/305422 (https://phabricator.wikimedia.org/T100902) [23:56:38] RECOVERY - puppet last run on kafka1002 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures