[00:54:03] (03PS1) 10Dzahn: icinga: move nsca_frack.cfg to public repo [puppet] - 10https://gerrit.wikimedia.org/r/352067 [00:55:10] (03PS2) 10Dzahn: icinga: move nsca_frack.cfg to public repo [puppet] - 10https://gerrit.wikimedia.org/r/352067 [01:02:56] (03CR) 10Dzahn: [C: 032] icinga: move nsca_frack.cfg to public repo [puppet] - 10https://gerrit.wikimedia.org/r/352067 (owner: 10Dzahn) [01:05:28] (03CR) 10Dzahn: "no-op in prod on both machines" [puppet] - 10https://gerrit.wikimedia.org/r/352067 (owner: 10Dzahn) [01:10:13] RECOVERY - cassandra-a CQL 10.64.48.98:9042 on restbase1018 is OK: TCP OK - 0.036 second response time on 10.64.48.98 port 9042 [01:22:07] (03PS1) 10Dzahn: icinga: move send_nsca.cfg to public repo [puppet] - 10https://gerrit.wikimedia.org/r/352070 [01:26:02] !log T163292: starting bootstrap of restbase1018-b [01:26:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:26:13] T163292: Failed disk / degraded RAID arrays: restbase1018.eqiad.wmnet - https://phabricator.wikimedia.org/T163292 [01:26:36] (03PS2) 10Dzahn: icinga: move send_nsca.cfg to public repo [puppet] - 10https://gerrit.wikimedia.org/r/352070 [01:26:45] (03CR) 10Dzahn: [C: 032] "http://puppet-compiler.wmflabs.org/6303/" [puppet] - 10https://gerrit.wikimedia.org/r/352070 (owner: 10Dzahn) [02:31:47] 06Operations, 10ops-ulsfo: track the ops juniper kit in OIT den - https://phabricator.wikimedia.org/T160581#3237532 (10RobH) 05Open>03Resolved items have been handled, some hsipped to ulsfo, eqiad, and the optics donated to oit hardware ewaste recycling. [02:33:21] !log l10nupdate@tin scap sync-l10n completed (1.29.0-wmf.21) (duration: 08m 29s) [02:33:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:38:35] !log l10nupdate@tin ResourceLoader cache refresh completed at Fri May 5 02:38:35 UTC 2017 (duration 5m 14s) [02:38:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:54:38] 06Operations, 10MediaWiki-JobQueue, 10MediaWiki-JobRunner, 13Patch-For-Review, and 2 others: jobqueue is full of refreshlinks duplicates after the switchover. - https://phabricator.wikimedia.org/T163418#3237547 (10Krinkle) [03:11:50] 06Operations, 05codfw-rollout: url-downloader should be set up more redundantly - https://phabricator.wikimedia.org/T122134#1897515 (10Krinkle) >>! In T122134#2440785, @akosiaris wrote: > The task of getting a second url-downloader instance per DC for HA is still stalled however, pending getting ganeti cross r... [04:08:04] 06Operations, 06Performance-Team, 13Patch-For-Review: webpagetest-alerts: Difference in size authenticated - https://phabricator.wikimedia.org/T164209#3237600 (10Dzahn) We now see in Icinga: "CRITICAL: https://grafana.wikimedia.org/dashboard/db/webpagetest-alerts is alerting: **Difference in size anonymous,... [04:14:25] 06Operations, 10netops: pfw-eqiad.wikimedia.org - 3 interfaces down (fundraising hosts) - https://phabricator.wikimedia.org/T164554#3237613 (10Dzahn) [04:15:00] 06Operations, 10netops: pfw-eqiad.wikimedia.org - 3 interfaces down (fundraising hosts) - https://phabricator.wikimedia.org/T164554#3237601 (10Dzahn) btw, what is the right phab tag for fundraising-tech ? [04:15:37] ACKNOWLEDGEMENT - Router interfaces on pfw-eqiad is CRITICAL: CRITICAL: host 208.80.154.218, interfaces up: 102, down: 3, dormant: 0, excluded: 3, unused: 0BRge-2/0/8: down - lutetiumBRge-11/0/6: down - db1025BRge-11/0/4: down - boronBR daniel_zahn https://phabricator.wikimedia.org/T164554 [04:16:45] 06Operations, 10netops: pfw-eqiad.wikimedia.org - 3 interfaces down (fundraising hosts) - https://phabricator.wikimedia.org/T164554#3237615 (10Dzahn) [04:19:02] 06Operations, 10netops: pfw-eqiad.wikimedia.org - 3 interfaces down (fundraising hosts) - https://phabricator.wikimedia.org/T164554#3237616 (10Dzahn) pfw means "payments firewall" [[ https://wikitech.wikimedia.org/wiki/Infrastructure_naming_conventions | naming conventions ]], so this is a Fundraising Tech issue [04:19:54] PROBLEM - mailman I/O stats on fermium is CRITICAL: CRITICAL - I/O stats: Transfers/Sec=1931.70 Read Requests/Sec=3073.10 Write Requests/Sec=0.70 KBytes Read/Sec=30982.80 KBytes_Written/Sec=14.80 [04:20:03] 06Operations, 10netops: pfw-eqiad.wikimedia.org - 3 interfaces down (fundraising hosts) - https://phabricator.wikimedia.org/T164554#3237601 (10Cmjohnson) These hosts are being decommissioned. [04:21:37] !log scheduled long downtime for mailman I/O stats on fermium - until we find better ways to deal with the normal spikes causing alerts [04:21:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:22:16] ACKNOWLEDGEMENT - mailman I/O stats on fermium is CRITICAL: CRITICAL - I/O stats: Transfers/Sec=4921.40 Read Requests/Sec=2995.20 Write Requests/Sec=4.00 KBytes Read/Sec=30895.60 KBytes_Written/Sec=51.20 daniel_zahn too much false alerting due to normal spikes [04:23:44] 06Operations, 10netops: pfw-eqiad.wikimedia.org - 3 interfaces down (fundraising hosts) - https://phabricator.wikimedia.org/T164554#3237622 (10Dzahn) p:05Triage>03Low thanks @cmjohnson. lowering prio. [04:25:23] PROBLEM - Apache HTTP on mw2218 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:26:13] RECOVERY - Apache HTTP on mw2218 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.054 second response time [04:28:53] RECOVERY - mailman I/O stats on fermium is OK: OK - I/O stats: Transfers/Sec=0.60 Read Requests/Sec=3.60 Write Requests/Sec=0.50 KBytes Read/Sec=24.40 KBytes_Written/Sec=9.60 [05:02:43] PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 212, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-3/2/3: down - Core: cr2-codfw:xe-5/0/1 (Zayo, OGYX/120003//ZYO) 36ms {#2909} [10Gbps wave]BR [05:04:43] RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 214, down: 0, dormant: 0, excluded: 0, unused: 0 [05:48:23] (03PS1) 10Marostegui: db-codfw.php: Repool db2059, depool db2052 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/352076 (https://phabricator.wikimedia.org/T162539) [05:51:02] (03PS2) 10Marostegui: db-codfw.php: Repool db2059, depool db2052 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/352076 (https://phabricator.wikimedia.org/T162539) [05:52:15] (03CR) 10Marostegui: [C: 032] db-codfw.php: Repool db2059, depool db2052 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/352076 (https://phabricator.wikimedia.org/T162539) (owner: 10Marostegui) [05:53:21] (03Merged) 10jenkins-bot: db-codfw.php: Repool db2059, depool db2052 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/352076 (https://phabricator.wikimedia.org/T162539) (owner: 10Marostegui) [05:53:32] (03CR) 10jenkins-bot: db-codfw.php: Repool db2059, depool db2052 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/352076 (https://phabricator.wikimedia.org/T162539) (owner: 10Marostegui) [05:55:01] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Repool db2059, depool db2052 - T162539 T163548 (duration: 00m 40s) [05:55:05] !Deploy alter table on wikidatawiki.wb_terms - db2052 - T162539 T163548 [05:55:08] gah [05:55:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:55:10] T162539: Deploy schema change for adding term_full_entity_id column to wb_terms table - https://phabricator.wikimedia.org/T162539 [05:55:10] T163548: Drop the useless wb_terms keys "wb_terms_entity_type" and "wb_terms_type" on "wb_terms" table - https://phabricator.wikimedia.org/T163548 [05:55:19] !log Deploy alter table on wikidatawiki.wb_terms - db2052 - T162539 T163548 [05:55:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:36:23] PROBLEM - Apache HTTP on mw2129 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:37:13] RECOVERY - Apache HTTP on mw2129 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.045 second response time [06:45:39] !log starting cache_upload upgrades to varnish 4.1.6-1wm1 [06:45:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:49:03] (03CR) 10Ema: [C: 031] "I'm personally very happy about ignoring arrows!" [puppet] - 10https://gerrit.wikimedia.org/r/351225 (owner: 10Dzahn) [06:55:13] (03CR) 10Muehlenhoff: [C: 031] "Agreed, that feature was more annoying than useful." [puppet] - 10https://gerrit.wikimedia.org/r/351225 (owner: 10Dzahn) [07:11:43] RECOVERY - Router interfaces on pfw-eqiad is OK: OK: host 208.80.154.218, interfaces up: 102, down: 0, dormant: 0, excluded: 3, unused: 0 [07:11:58] !log Deploy alter table on wikidatawiki.wb_terms - dbstore2001 - https://phabricator.wikimedia.org/T162539 https://phabricator.wikimedia.org/T163548 [07:12:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:12:18] 06Operations, 10netops: pfw-eqiad.wikimedia.org - 3 interfaces down (fundraising hosts) - https://phabricator.wikimedia.org/T164554#3237745 (10ayounsi) 05Open>03Resolved a:03ayounsi >>! In T164554#3237613, @Dzahn wrote: > btw, what is the right phab tag for fundraising-tech ? https://phabricator.wikimed... [07:19:44] 06Operations: Clean up wikimedia's apt repo - https://phabricator.wikimedia.org/T164521#3236503 (10MoritzMuehlenhoff) IMO it's fine to keep these around, if only for IT historians to study our setup in decades to come :-) Since these debs are no longer referenced by our Packages files, there's no risk of machine... [07:22:33] 06Operations: Clean up wikimedia's apt repo - https://phabricator.wikimedia.org/T164521#3237752 (10Paladox) Should I close as declined? [07:28:51] 06Operations, 10ops-codfw, 10hardware-requests: reclaim tempdb2001(WMF6407) to spares - https://phabricator.wikimedia.org/T164513#3237771 (10Marostegui) [07:29:57] 06Operations, 06Performance-Team, 13Patch-For-Review: webpagetest-alerts: Difference in size authenticated - https://phabricator.wikimedia.org/T164209#3237772 (10Peter) I'm still waiting on getting metrics for the new page, I just disabled the alerts until we have that. [07:30:01] 06Operations, 05codfw-rollout: url-downloader should be set up more redundantly - https://phabricator.wikimedia.org/T122134#3237773 (10akosiaris) [07:30:05] 06Operations, 10ops-codfw, 13Patch-For-Review: codfw: ganeti2007-ganeti2008 racking and onsite setup task - https://phabricator.wikimedia.org/T164011#3237774 (10akosiaris) [07:32:48] 06Operations, 05codfw-rollout: url-downloader should be set up more redundantly - https://phabricator.wikimedia.org/T122134#3237775 (10akosiaris) >>! In T122134#3237561, @Krinkle wrote: >>>! In T122134#2440785, @akosiaris wrote: >> The task of getting a second url-downloader instance per DC for HA is still sta... [07:38:44] (03Abandoned) 10Giuseppe Lavagetto: Depool text esams due to outage [dns] - 10https://gerrit.wikimedia.org/r/351774 (owner: 10Giuseppe Lavagetto) [07:46:33] RECOVERY - cassandra-b CQL 10.64.48.99:9042 on restbase1018 is OK: TCP OK - 0.036 second response time on 10.64.48.99 port 9042 [07:49:15] !log Deploy alter table on wikidatawiki.wb_terms - dbstore1002 - https://phabricator.wikimedia.org/T162539 https://phabricator.wikimedia.org/T163548 [07:49:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:03:46] (03CR) 10Alexandros Kosiaris: "Indeed. Fixed. thanks!" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/350861 (https://phabricator.wikimedia.org/T129136) (owner: 10Alexandros Kosiaris) [08:03:58] (03PS4) 10Alexandros Kosiaris: librenms: Introduce scap3 deployment [puppet] - 10https://gerrit.wikimedia.org/r/350861 (https://phabricator.wikimedia.org/T129136) [08:05:59] 06Operations, 10hardware-requests: Unmanaged switch for eqiad frack - https://phabricator.wikimedia.org/T164561#3237819 (10ayounsi) [08:21:34] 06Operations: Reduce rpcbind use - https://phabricator.wikimedia.org/T106477#1469915 (10fgiunchedi) >>! In T106477#3236038, @MoritzMuehlenhoff wrote: > nfs-common and rpcbind get installed during the initial d-i base installation. At this point our apt config to not install recommended packages is not yet in pla... [08:21:38] (03PS2) 10Muehlenhoff: Build a stretch image [puppet] - 10https://gerrit.wikimedia.org/r/350843 (https://phabricator.wikimedia.org/T162042) [08:25:03] (03CR) 10Giuseppe Lavagetto: [C: 04-2] "This is part of our coding standards, and of the official reccomendations for puppet coding. Virtually all the code in the world out there" [puppet] - 10https://gerrit.wikimedia.org/r/351225 (owner: 10Dzahn) [08:27:54] (03CR) 10Alexandros Kosiaris: [C: 032] librenms: Introduce scap3 deployment [puppet] - 10https://gerrit.wikimedia.org/r/350861 (https://phabricator.wikimedia.org/T129136) (owner: 10Alexandros Kosiaris) [08:27:59] (03PS5) 10Alexandros Kosiaris: librenms: Introduce scap3 deployment [puppet] - 10https://gerrit.wikimedia.org/r/350861 (https://phabricator.wikimedia.org/T129136) [08:28:03] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] librenms: Introduce scap3 deployment [puppet] - 10https://gerrit.wikimedia.org/r/350861 (https://phabricator.wikimedia.org/T129136) (owner: 10Alexandros Kosiaris) [08:34:11] (03CR) 10Muehlenhoff: [C: 032] Build a stretch image [puppet] - 10https://gerrit.wikimedia.org/r/350843 (https://phabricator.wikimedia.org/T162042) (owner: 10Muehlenhoff) [08:34:17] (03PS3) 10Muehlenhoff: Build a stretch image [puppet] - 10https://gerrit.wikimedia.org/r/350843 (https://phabricator.wikimedia.org/T162042) [08:34:23] PROBLEM - Keyholder SSH agent on naos is CRITICAL: CRITICAL: Keyholder is not armed. Run keyholder arm to arm it. [08:35:01] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] Add scap.cfg [software/librenms] - 10https://gerrit.wikimedia.org/r/351811 (https://phabricator.wikimedia.org/T129136) (owner: 10Alexandros Kosiaris) [08:37:36] 06Operations, 10DBA: Decomissions old s2 eqiad hosts (db1018, db1021, db1024, db1036) - https://phabricator.wikimedia.org/T162699#3237844 (10Marostegui) You think db1024 can go away now? It was the old old master (it is depooled) and as per T154485#3171631 we should be good to go. [08:38:24] 06Operations, 10DBA: Decomissions old s2 eqiad hosts (db1018, db1021, db1024, db1036) - https://phabricator.wikimedia.org/T162699#3237847 (10jcrespo) Yes [08:39:35] 06Operations, 10DBA: Decomissions old s2 eqiad hosts (db1018, db1021, db1024, db1036) - https://phabricator.wikimedia.org/T162699#3237850 (10Marostegui) >>! In T162699#3237847, @jcrespo wrote: > Yes Great, will prepare the patches and merge them next week and let Chris know so he can do his part too. [08:42:31] I am going to test mydumper on dbstore2001 [08:42:50] not sure why naos is alarming now [08:42:57] maybe it was downtimed? [08:43:04] (03CR) 10ArielGlenn: "The whitespace changes though are easily recognized as such. Also, the consistent spacing is easier on the eyes when reviewing code. My ." [puppet] - 10https://gerrit.wikimedia.org/r/351225 (owner: 10Dzahn) [08:43:19] no, it started 10 minutes ago [08:44:57] (03PS1) 10Marostegui: db-codfw,db-eqiad.php: Decommission db1024 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/352087 (https://phabricator.wikimedia.org/T162699) [08:46:01] (03PS1) 10Marostegui: s2.hosts: Remove db1024 [software] - 10https://gerrit.wikimedia.org/r/352088 (https://phabricator.wikimedia.org/T162699) [08:46:09] !log swift codfw-prod: ms-be2001 - ms-be2012 weight 700 - T162785 [08:46:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:46:18] T162785: Decomission ms-be2001 - ms-be2012 - https://phabricator.wikimedia.org/T162785 [08:46:59] ahhh puppet-agent[26775]: (Salt::Grain[deployment_server]) Scheduling refresh of Exec[deployment_server_sync_all] [08:48:02] (03CR) 10ArielGlenn: "It's the version of salt that makes the difference: 2016.3 and later use SHA256. I'd prefer a check of that sort." [puppet] - 10https://gerrit.wikimedia.org/r/351914 (owner: 10Andrew Bogott) [08:48:09] right after this puppet run naos showed the keyholder not armed [08:48:30] !log re-arming keyholder on naos [08:48:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:50:23] RECOVERY - Keyholder SSH agent on naos is OK: OK: Keyholder is armed with all configured keys. [08:50:43] PROBLEM - Keyholder SSH agent on mira is CRITICAL: CRITICAL: Keyholder is not armed. Run keyholder arm to arm it. [08:52:33] (03PS4) 10Filippo Giunchedi: [WIP] Send 5xx from kafkatee to logstash [puppet] - 10https://gerrit.wikimedia.org/r/350817 (https://phabricator.wikimedia.org/T149451) [08:53:32] what [08:53:35] again? [08:53:56] mmm I can see naos diamond[9472]: OK: Keyholder is armed with all configured keys. [08:54:00] (03CR) 10Filippo Giunchedi: "Indeed we do have kafka in deployment-prep receiving webrequest logs from varnishkafka. I don't see kafkacat running anywhere but I'll try" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/350817 (https://phabricator.wikimedia.org/T149451) (owner: 10Filippo Giunchedi) [08:54:07] (03PS1) 10Ayounsi: Librensm syslog to listen on v6 IPs as well as v4 [puppet] - 10https://gerrit.wikimedia.org/r/352092 [08:54:32] godog: we could run kafkacat directly from one of the kafka hosts [08:55:25] ahhh mira :D [08:56:38] (03PS1) 10Marostegui: mariadb: Get ready to decomission db1024 [puppet] - 10https://gerrit.wikimedia.org/r/352093 (https://phabricator.wikimedia.org/T162699) [08:57:34] elukey: yeah that'll work for a test [08:59:26] I hate telia now [09:00:22] 06Operations, 10ops-codfw: setup naos/WMF6406 as new codfw deployment server - https://phabricator.wikimedia.org/T162900#3237882 (10fgiunchedi) Anything left to do here? naos is effectively in service and mira should be decom'd (either with its NIC swapped or not in T162859) [09:00:29] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "The direction is correct and I overall think this looks correct, but given the size of the patch I didn't get into verifying every detail." (0317 comments) [puppet] - 10https://gerrit.wikimedia.org/r/347006 (owner: 10Gehel) [09:00:53] !log re-arm keyholder on mira (new scap key added for librenms) [09:01:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:01:43] RECOVERY - Keyholder SSH agent on mira is OK: OK: Keyholder is armed with all configured keys. [09:03:50] elukey: or setup a similar role to oxygen in deployment-prep, even better probably if not already setup somewhere [09:04:39] godog: yep that one seems good too [09:04:46] it should be only a matter of spinning up a VM [09:05:26] (03CR) 10Giuseppe Lavagetto: [C: 032] Dnsdisc: try multiple times on check_record [switchdc] - 10https://gerrit.wikimedia.org/r/351670 (https://phabricator.wikimedia.org/T164396) (owner: 10Volans) [09:06:21] elukey: from past experiences sadly "only a matter of" turns out to be practically always false :( [09:06:37] though the best one years ago I heard was "just a matter of programming" [09:07:20] ahahhaha [09:07:23] _joe_: the profiles should have class parameters? I seem to remember the opposite from your review of the similar change to elasticsearch / cirrus... [09:08:06] <_joe_> gehel: if that was done a long time ago, maybe [09:08:18] <_joe_> my first draft wanted things the way you wrote them [09:08:46] _joe_: Ok, I'll update elastic as well then... [09:08:50] <_joe_> but then needs of the labs UI made us adapt this standard, see https://wikitech.wikimedia.org/wiki/Puppet_coding#Organization [09:09:11] <_joe_> "Profile classes should only have parameters that default to an explicit hiera calls with no fallback value." [09:09:49] <_joe_> gehel: use this https://wikitech.wikimedia.org/wiki/Puppet_coding#A_working_example:_deployment_host as a reference [09:10:04] <_joe_> and well, we forgot to remove the system::role there [09:10:06] <_joe_> let me amend it [09:10:09] why the explicit hiera call and not automatic parameter binding with no default at all? [09:10:21] <_joe_> gehel: git grep :P [09:10:34] Ah yes, make sense! [09:10:56] <_joe_> gehel: the other reason is, I'd like sooner or later to just TURN OFF auto param lookup in hiera [09:11:46] yep, I like explicit as well! Ok, back to cleaning ... [09:12:43] PROBLEM - HP RAID on dbstore2001 is CRITICAL: CHECK_NRPE: Socket timeout after 50 seconds. [09:13:01] ^ that is probably because of the alter table running and overloading it, I will check [09:14:27] (03CR) 10Marostegui: "This looks good: https://puppet-compiler.wmflabs.org/6305/" [puppet] - 10https://gerrit.wikimedia.org/r/352093 (https://phabricator.wikimedia.org/T162699) (owner: 10Marostegui) [09:16:25] 06Operations, 10DBA, 13Patch-For-Review: Decomissions old s2 eqiad hosts (db1018, db1021, db1024, db1036) - https://phabricator.wikimedia.org/T162699#3237920 (10Marostegui) All the patches to decommission db1024 are ready now. I will merge next week and create an specific task for Chris for that host so he d... [09:16:32] also my backup [09:17:11] (03CR) 10Giuseppe Lavagetto: [C: 032] t05_switch_traffic: increase verbosity [switchdc] - 10https://gerrit.wikimedia.org/r/351674 (https://phabricator.wikimedia.org/T164400) (owner: 10Volans) [09:17:11] Poor dbstore2001, not having an easy Friday [09:17:44] thanks _joe_ ! [09:19:05] (03CR) 10Jcrespo: [C: 031] mariadb: Get ready to decomission db1024 [puppet] - 10https://gerrit.wikimedia.org/r/352093 (https://phabricator.wikimedia.org/T162699) (owner: 10Marostegui) [09:24:56] (03CR) 10Giuseppe Lavagetto: [C: 032] t09_start_maintenance: clear systemctl state on dc_from [switchdc] - 10https://gerrit.wikimedia.org/r/351680 (https://phabricator.wikimedia.org/T164403) (owner: 10Volans) [09:36:12] (03PS1) 10Muehlenhoff: labstore: Explicitly declare package dependencies for nfs-common and rpcbind [puppet] - 10https://gerrit.wikimedia.org/r/352097 (https://phabricator.wikimedia.org/T106477) [09:38:57] (03PS1) 10Muehlenhoff: dumps: Explicitly declare package dependencies for nfs-common and rpcbind [puppet] - 10https://gerrit.wikimedia.org/r/352098 (https://phabricator.wikimedia.org/T106477) [09:53:34] (03CR) 10Volans: [C: 031] "The code looks good, see some minor comments inline." (034 comments) [software/service-checker] - 10https://gerrit.wikimedia.org/r/351110 (owner: 10Giuseppe Lavagetto) [09:57:06] (03CR) 10Filippo Giunchedi: ">" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/350817 (https://phabricator.wikimedia.org/T149451) (owner: 10Filippo Giunchedi) [09:57:38] (03PS5) 10Filippo Giunchedi: Send 5xx from kafkatee to logstash [puppet] - 10https://gerrit.wikimedia.org/r/350817 (https://phabricator.wikimedia.org/T149451) [10:00:24] (03CR) 10ArielGlenn: "Makes sense to me." [puppet] - 10https://gerrit.wikimedia.org/r/352098 (https://phabricator.wikimedia.org/T106477) (owner: 10Muehlenhoff) [10:00:38] (03CR) 10ArielGlenn: [C: 031] dumps: Explicitly declare package dependencies for nfs-common and rpcbind [puppet] - 10https://gerrit.wikimedia.org/r/352098 (https://phabricator.wikimedia.org/T106477) (owner: 10Muehlenhoff) [10:01:13] forgot to c:1 along with the comment >_< [10:05:33] (03CR) 10Elukey: [C: 04-1] "@BryanDavis what would be the data retention period for this data? I am asking because I'd like to make sure that webrequest logs (even if" [puppet] - 10https://gerrit.wikimedia.org/r/350817 (https://phabricator.wikimedia.org/T149451) (owner: 10Filippo Giunchedi) [10:07:02] gehel: I think that --^ is what you have been working on recently right ? [10:07:26] (03CR) 10Volans: [C: 04-1] "Code looks good, I think there is a typo though. Minor comments inline." (034 comments) [software/service-checker] - 10https://gerrit.wikimedia.org/r/351882 (owner: 10Giuseppe Lavagetto) [10:12:33] RECOVERY - HP RAID on dbstore2001 is OK: OK: Slot 0: OK: 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:9, 1I:1:10, 1I:1:11, 1I:1:12, Controller, Battery/Capacitor [10:16:41] 06Operations, 10DBA, 13Patch-For-Review: eqiad rack/setup 11 new DB servers - https://phabricator.wikimedia.org/T162233#3238142 (10jcrespo) [10:17:50] 06Operations, 10ops-eqiad, 06Analytics-Kanban, 06DC-Ops: analytics1030 stuck in console while booting - https://phabricator.wikimedia.org/T162046#3238143 (10elukey) @Cmjohnson any news on an1030? :) [10:22:45] 06Operations, 10ops-eqiad, 10Analytics-Cluster, 06Analytics-Kanban, 15User-Elukey: Analytics hosts showed high temperature alarms - https://phabricator.wikimedia.org/T132256#3238164 (10elukey) Tried again today: ``` ===== NODE GROUP ===== (1) analytics1060.eqiad.wmnet ----- OUTPUT of 'grep "Hardware e..... [10:26:30] 06Operations, 10ops-codfw, 10DBA: es2019 crashed again - https://phabricator.wikimedia.org/T149526#3238169 (10jcrespo) 05Resolved>03Open Pending to run compare.py around the ids obtained before. [10:38:05] 06Operations, 10Traffic: Investigate nginx reload behavior - https://phabricator.wikimedia.org/T164579#3238212 (10ema) p:05Triage>03Normal [10:44:26] 06Operations, 10DBA: es2019 crashed again - https://phabricator.wikimedia.org/T149526#3238218 (10jcrespo) a:05Marostegui>03jcrespo [10:44:55] (03PS1) 10Muehlenhoff: Strip nfs-common/rpcbind during jessie base installation [puppet] - 10https://gerrit.wikimedia.org/r/352105 (https://phabricator.wikimedia.org/T106477) [10:50:57] 06Operations, 10DBA: Create less overhead on bacula jobs when dumping production databases - https://phabricator.wikimedia.org/T162789#3238231 (10jcrespo) I am recovering db1015 again minus cebwiki. Better that leaving db1015 broken and doing nothing. I have disabled notifications on db1015 just in case. Crea... [10:55:00] 06Operations, 10DBA: Create less overhead on bacula jobs when dumping production databases - https://phabricator.wikimedia.org/T162789#3238240 (10jcrespo) a:03jcrespo [10:55:13] (03CR) 10Giuseppe Lavagetto: Add statsd support (034 comments) [software/service-checker] - 10https://gerrit.wikimedia.org/r/351882 (owner: 10Giuseppe Lavagetto) [10:55:19] (03CR) 10Giuseppe Lavagetto: Parallelize url fetching (034 comments) [software/service-checker] - 10https://gerrit.wikimedia.org/r/351110 (owner: 10Giuseppe Lavagetto) [10:56:01] (03PS3) 10Giuseppe Lavagetto: Parallelize url fetching [software/service-checker] - 10https://gerrit.wikimedia.org/r/351110 [10:56:03] (03PS2) 10Giuseppe Lavagetto: Add statsd support [software/service-checker] - 10https://gerrit.wikimedia.org/r/351882 [10:56:44] 06Operations: Clean up wikimedia's apt repo regarding precise and lucid - https://phabricator.wikimedia.org/T164521#3238247 (10Aklapper) p:05Triage>03Lowest [10:57:14] !log akosiaris@tin Started deploy [librenms/librenms@b25a5e9]: (no justification provided) [10:57:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:57:28] !log akosiaris@tin Finished deploy [librenms/librenms@b25a5e9]: (no justification provided) (duration: 00m 13s) [10:57:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:58:34] !log akosiaris@tin Started deploy [librenms/librenms@b25a5e9]: (no justification provided) [10:58:36] !log akosiaris@tin Finished deploy [librenms/librenms@b25a5e9]: (no justification provided) (duration: 00m 02s) [10:58:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:58:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:00:16] !log akosiaris@tin Started deploy [librenms/librenms@b25a5e9]: (no justification provided) [11:00:20] !log akosiaris@tin Finished deploy [librenms/librenms@b25a5e9]: (no justification provided) (duration: 00m 03s) [11:00:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:00:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:00:33] (03CR) 10Volans: [C: 031] "LGTM" [software/service-checker] - 10https://gerrit.wikimedia.org/r/351110 (owner: 10Giuseppe Lavagetto) [11:00:59] !log akosiaris@tin Started deploy [librenms/librenms@b25a5e9]: (no justification provided) [11:01:01] !log akosiaris@tin Finished deploy [librenms/librenms@b25a5e9]: (no justification provided) (duration: 00m 02s) [11:01:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:01:11] !log akosiaris@tin Started deploy [librenms/librenms@b25a5e9]: (no justification provided) [11:01:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:01:13] !log akosiaris@tin Finished deploy [librenms/librenms@b25a5e9]: (no justification provided) (duration: 00m 01s) [11:01:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:01:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:02:32] !log akosiaris@tin Started deploy [librenms/librenms@b25a5e9]: (no justification provided) [11:02:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:02:45] (03CR) 10Volans: [C: 031] "LGTM" (031 comment) [software/service-checker] - 10https://gerrit.wikimedia.org/r/351882 (owner: 10Giuseppe Lavagetto) [11:02:56] !log akosiaris@tin Finished deploy [librenms/librenms@b25a5e9]: (no justification provided) (duration: 00m 23s) [11:03:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:08:51] !log akosiaris@tin Started deploy [librenms/librenms@b25a5e9]: (no justification provided) [11:08:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:09:06] !log akosiaris@tin Finished deploy [librenms/librenms@b25a5e9]: (no justification provided) (duration: 00m 14s) [11:09:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:15:55] (03CR) 10Filippo Giunchedi: [C: 031] "LGTM, minor nit about quoting the task number" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/352105 (https://phabricator.wikimedia.org/T106477) (owner: 10Muehlenhoff) [11:16:33] (03PS1) 10Elukey: Set Piwik cronjob archiver and tune Piwik's defaults [puppet] - 10https://gerrit.wikimedia.org/r/352111 [11:17:25] !log akosiaris@tin Started deploy [librenms/librenms@b25a5e9]: (no justification provided) [11:17:27] !log akosiaris@tin Finished deploy [librenms/librenms@b25a5e9]: (no justification provided) (duration: 00m 02s) [11:17:30] (03CR) 10jerkins-bot: [V: 04-1] Set Piwik cronjob archiver and tune Piwik's defaults [puppet] - 10https://gerrit.wikimedia.org/r/352111 (owner: 10Elukey) [11:17:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:17:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:17:48] (03PS2) 10Elukey: Set Piwik cronjob archiver and tune Piwik's defaults [puppet] - 10https://gerrit.wikimedia.org/r/352111 [11:18:31] (03PS3) 10Elukey: Set Piwik cronjob archiver and tune Piwik's defaults [puppet] - 10https://gerrit.wikimedia.org/r/352111 [11:18:34] sorry jenkins you are right [11:19:45] (03CR) 10jerkins-bot: [V: 04-1] Set Piwik cronjob archiver and tune Piwik's defaults [puppet] - 10https://gerrit.wikimedia.org/r/352111 (owner: 10Elukey) [11:21:49] (03PS1) 10Alexandros Kosiaris: Remove create of librenms directory [puppet] - 10https://gerrit.wikimedia.org/r/352112 [11:22:24] (03CR) 10Giuseppe Lavagetto: [C: 032] Parallelize url fetching [software/service-checker] - 10https://gerrit.wikimedia.org/r/351110 (owner: 10Giuseppe Lavagetto) [11:23:17] (03CR) 10Alexandros Kosiaris: [C: 032] Remove create of librenms directory [puppet] - 10https://gerrit.wikimedia.org/r/352112 (owner: 10Alexandros Kosiaris) [11:23:22] (03PS2) 10Alexandros Kosiaris: Remove create of librenms directory [puppet] - 10https://gerrit.wikimedia.org/r/352112 [11:23:27] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] Remove create of librenms directory [puppet] - 10https://gerrit.wikimedia.org/r/352112 (owner: 10Alexandros Kosiaris) [11:24:04] (03CR) 10Giuseppe Lavagetto: [C: 032] Add statsd support [software/service-checker] - 10https://gerrit.wikimedia.org/r/351882 (owner: 10Giuseppe Lavagetto) [11:26:20] !log akosiaris@tin Started deploy [librenms/librenms@b25a5e9]: (no justification provided) [11:26:22] !log akosiaris@tin Finished deploy [librenms/librenms@b25a5e9]: (no justification provided) (duration: 00m 02s) [11:26:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:26:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:26:47] grr scap is not creating the symlink [11:27:33] PROBLEM - puppet last run on netmon1001 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 2 minutes ago with 2 failures. Failed resources (up to 3 shown): File[/srv/deployment/librenms/librenms/purge.py],File[/srv/deployment/librenms/librenms/config.php] [11:29:07] !log installing openjdk-8 security updates/cassandra restarts on restbase staging clusters [11:29:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:30:01] !log akosiaris@tin Started deploy [librenms/librenms@b25a5e9]: (no justification provided) [11:30:04] !log akosiaris@tin Finished deploy [librenms/librenms@b25a5e9]: (no justification provided) (duration: 00m 03s) [11:30:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:30:13] 06Operations, 10DBA: es2019 crashed again - https://phabricator.wikimedia.org/T149526#3238274 (10jcrespo) Left running on neodymium (I did some optimizations to ignore values below and beyond max id respectively): ``` while read db id; do echo "./compare.py es1019.eqiad.wmnet es2019.codfw.wmnet $db blobs_clust... [11:30:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:31:33] RECOVERY - puppet last run on netmon1001 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [11:41:04] (03PS1) 10Jcrespo: Increase my default screen scrollback buffer to 10000 lines [puppet] - 10https://gerrit.wikimedia.org/r/352113 [11:50:22] (03PS4) 10Elukey: Set Piwik cronjob archiver and tune Piwik's defaults [puppet] - 10https://gerrit.wikimedia.org/r/352111 [11:52:54] (03CR) 10Elukey: [C: 032] Set Piwik cronjob archiver and tune Piwik's defaults [puppet] - 10https://gerrit.wikimedia.org/r/352111 (owner: 10Elukey) [11:52:59] (03PS5) 10Elukey: Set Piwik cronjob archiver and tune Piwik's defaults [puppet] - 10https://gerrit.wikimedia.org/r/352111 [11:53:04] (03CR) 10Elukey: [V: 032 C: 032] Set Piwik cronjob archiver and tune Piwik's defaults [puppet] - 10https://gerrit.wikimedia.org/r/352111 (owner: 10Elukey) [11:56:40] (03Draft3) 10Alexandros Kosiaris: Set checkReceivedObjects to false in config [software/librenms] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/352116 [11:56:46] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] Set checkReceivedObjects to false in config [software/librenms] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/352116 (owner: 10Alexandros Kosiaris) [11:59:49] (03PS1) 10Elukey: Set correct minute to Piwik's archiver crontab [puppet] - 10https://gerrit.wikimedia.org/r/352117 [12:00:11] (03CR) 10Elukey: [V: 032 C: 032] Set correct minute to Piwik's archiver crontab [puppet] - 10https://gerrit.wikimedia.org/r/352117 (owner: 10Elukey) [12:02:58] (03CR) 10Marostegui: "I am going to copy this one :-)" [puppet] - 10https://gerrit.wikimedia.org/r/352113 (owner: 10Jcrespo) [12:12:08] (03PS1) 10Marostegui: Increase default screen scrollback buffer to 10000 lines [puppet] - 10https://gerrit.wikimedia.org/r/352118 [12:16:58] !log reboot kafka1018 for kernel upgrades [12:17:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:17:23] PROBLEM - Apache HTTP on mw2221 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:18:13] RECOVERY - Apache HTTP on mw2221 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.041 second response time [12:20:24] 06Operations, 10ops-eqiad, 10Analytics-Cluster, 06Analytics-Kanban, 15User-Elukey: Analytics hosts showed high temperature alarms - https://phabricator.wikimedia.org/T132256#3238415 (10elukey) This is also interesting: ``` ===== NODE GROUP ===== (1) kafka1018.eqiad.wmnet ----- OUTPUT of 'grep "Hardware... [12:21:42] (03CR) 10Marostegui: [C: 031] Increase my default screen scrollback buffer to 10000 lines [puppet] - 10https://gerrit.wikimedia.org/r/352113 (owner: 10Jcrespo) [12:24:44] (03CR) 10Marostegui: "https://puppet-compiler.wmflabs.org/6307/" [puppet] - 10https://gerrit.wikimedia.org/r/352118 (owner: 10Marostegui) [12:34:40] !log akosiaris@tin Started deploy [librenms/librenms@9fa1391]: (no justification provided) [12:34:47] !log akosiaris@tin Finished deploy [librenms/librenms@9fa1391]: (no justification provided) (duration: 00m 07s) [12:34:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:34:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:37:43] PROBLEM - LibreNMS HTTPS on netmon1001 is CRITICAL: HTTP CRITICAL: HTTP/1.0 500 Internal Server Error - 285 bytes in 0.165 second response time [12:40:43] RECOVERY - LibreNMS HTTPS on netmon1001 is OK: HTTP OK: HTTP/1.1 200 OK - 8419 bytes in 0.208 second response time [12:44:31] (03PS1) 10Muehlenhoff: dumps: Explicitly declare package dependency for rpcbind [puppet] - 10https://gerrit.wikimedia.org/r/352123 (https://phabricator.wikimedia.org/T106477) [12:45:52] (03PS1) 10Aklapper: Fix "Tasks closed" SQL query in monthly Phab metrics report email [puppet] - 10https://gerrit.wikimedia.org/r/352125 (https://phabricator.wikimedia.org/T164297) [12:46:29] (03PS1) 10Gehel: elasticsearch - cleanup profile::elasticsearch [puppet] - 10https://gerrit.wikimedia.org/r/352126 [12:54:56] 06Operations, 10Traffic: Investigate nginx reload behavior - https://phabricator.wikimedia.org/T164579#3238201 (10BBlack) How timely! The subject of how to do completely-seamless reloads (especially for TCP) is quite thorny. I've been pondering it and fighting with the issues for years on the UDP side for gd... [12:57:27] 06Operations, 10Traffic: Investigate nginx reload behavior - https://phabricator.wikimedia.org/T164579#3238580 (10BBlack) Also note from that lengthy post - if we were willing to test the scalability of iptables on cache hosts (which we've avoided for fear that it won't scale over cores like the rest of what w... [13:07:34] Anyone here know anything about mediawiki-vagrant? [13:10:15] 06Operations, 13Patch-For-Review: Reduce rpcbind use - https://phabricator.wikimedia.org/T106477#3238615 (10MoritzMuehlenhoff) I doublechecked production hosts: Hosts which have an /etc/exports: dataset1001.wikimedia.org labstore1003.eqiad.wmnet labstore1004.eqiad.wmnet labstore1005.eqiad.wmnet ms1001.wikime... [13:14:23] (03CR) 10Jcrespo: [C: 031] Increase default screen scrollback buffer to 10000 lines [puppet] - 10https://gerrit.wikimedia.org/r/352118 (owner: 10Marostegui) [13:14:38] (03PS2) 10Jcrespo: Increase my default screen scrollback buffer to 10000 lines [puppet] - 10https://gerrit.wikimedia.org/r/352113 [13:18:10] (03CR) 10Aklapper: "Note: This query is obviously more expensive than before..." [puppet] - 10https://gerrit.wikimedia.org/r/352125 (https://phabricator.wikimedia.org/T164297) (owner: 10Aklapper) [13:23:07] (03PS21) 10Gehel: maps - move to role / profile [puppet] - 10https://gerrit.wikimedia.org/r/347006 [13:23:26] o/ [13:23:32] (03PS2) 10Muehlenhoff: dumps: Explicitly declare package dependencies for nfs-common and rpcbind [puppet] - 10https://gerrit.wikimedia.org/r/352098 (https://phabricator.wikimedia.org/T106477) [13:23:38] (03CR) 10Gehel: "First pass on Joe's comments. Moving to profile::redis::master is still required." (0316 comments) [puppet] - 10https://gerrit.wikimedia.org/r/347006 (owner: 10Gehel) [13:24:53] 06Operations, 10Traffic: Investigate nginx reload behavior - https://phabricator.wikimedia.org/T164579#3238690 (10BBlack) Hmmm another thing - when we first deployed this OCSP updating method, GlobalSign was giving us 8-hour OCSP validity windows. At present (just checked) we're getting 4-day validity from Gl... [13:25:58] !log labstore1005/1004 'dpkg -i /home/jmm/*deb' for rpcbind fix (these are new security packages from mortizm) [13:26:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:28:05] !log T163292: bootstrapping Cassandra on restbase1008-c [13:28:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:28:13] T163292: Failed disk / degraded RAID arrays: restbase1018.eqiad.wmnet - https://phabricator.wikimedia.org/T163292 [13:31:48] (03CR) 10Muehlenhoff: [C: 032] dumps: Explicitly declare package dependencies for nfs-common and rpcbind [puppet] - 10https://gerrit.wikimedia.org/r/352098 (https://phabricator.wikimedia.org/T106477) (owner: 10Muehlenhoff) [13:32:14] (03PS2) 10Rush: labstore: Explicitly declare package dependencies for nfs-common and rpcbind [puppet] - 10https://gerrit.wikimedia.org/r/352097 (https://phabricator.wikimedia.org/T106477) (owner: 10Muehlenhoff) [13:33:46] (03PS3) 10Jcrespo: Increase my default screen scrollback buffer to 10000 lines [puppet] - 10https://gerrit.wikimedia.org/r/352113 [13:34:52] (03CR) 10Rush: [C: 032] labstore: Explicitly declare package dependencies for nfs-common and rpcbind [puppet] - 10https://gerrit.wikimedia.org/r/352097 (https://phabricator.wikimedia.org/T106477) (owner: 10Muehlenhoff) [13:37:03] (03CR) 10Jcrespo: [C: 032] Increase my default screen scrollback buffer to 10000 lines [puppet] - 10https://gerrit.wikimedia.org/r/352113 (owner: 10Jcrespo) [13:37:16] (03PS4) 10Jcrespo: Increase my default screen scrollback buffer to 10000 lines [puppet] - 10https://gerrit.wikimedia.org/r/352113 [13:38:21] (03PS1) 10Alexandros Kosiaris: Add WMF specific pages into the repo [software/librenms] - 10https://gerrit.wikimedia.org/r/352132 [13:38:49] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] Add WMF specific pages into the repo [software/librenms] - 10https://gerrit.wikimedia.org/r/352132 (owner: 10Alexandros Kosiaris) [13:39:55] !log akosiaris@tin Started deploy [librenms/librenms@c0aa3ca]: Deploy WMF specific pages to librenms [13:39:59] !log akosiaris@tin Finished deploy [librenms/librenms@c0aa3ca]: Deploy WMF specific pages to librenms (duration: 00m 03s) [13:40:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:40:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:45:51] !log installing remaining freetype security updates [13:45:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:45:58] 06Operations, 10Monitoring, 10netops, 13Patch-For-Review, 10Scap (Scap3-Adoption-Phase1): Deploy libreNMS with scap3 - https://phabricator.wikimedia.org/T129136#3238719 (10akosiaris) 05Open>03Resolved After a year and 2 months, I can finally happily resolve this. scap3 is now used to deploy librenms,... [13:50:09] (03PS2) 10Muehlenhoff: Strip nfs-common/rpcbind during jessie base installation [puppet] - 10https://gerrit.wikimedia.org/r/352105 (https://phabricator.wikimedia.org/T106477) [13:54:34] (03PS1) 10BBlack: ocsp updater: 1/day instead of 1/hr [puppet] - 10https://gerrit.wikimedia.org/r/352134 (https://phabricator.wikimedia.org/T164579) [13:54:59] (03PS1) 10BBlack: fqdn_rand cleanup: fix bad hour=>23 and minute=>59 [puppet] - 10https://gerrit.wikimedia.org/r/352135 [13:55:01] (03PS1) 10BBlack: fqdn_rand cleanup: always use a seed [puppet] - 10https://gerrit.wikimedia.org/r/352136 [13:55:05] (03PS1) 10BBlack: fqdn_rand cleanup: distinct seeds in same cron [puppet] - 10https://gerrit.wikimedia.org/r/352137 [13:55:48] 06Operations, 06Labs, 10hardware-requests: Codfw: (1) hardware access request for labtestnet2003 [region 2] - https://phabricator.wikimedia.org/T161764#3238756 (10chasemp) [13:56:20] 06Operations, 06Labs, 10hardware-requests: Codfw: (1) hardware access request for labtestnet2003 [region 2] - https://phabricator.wikimedia.org/T161764#3142232 (10chasemp) @robh note updated for 32GB per the current labnet spec [14:01:51] (03PS1) 10Muehlenhoff: stat: Explicitly declare package dependency for rpcbind [puppet] - 10https://gerrit.wikimedia.org/r/352138 (https://phabricator.wikimedia.org/T106477) [14:02:23] PROBLEM - puppet last run on db1094 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/home/jynus/.screenrc] [14:02:24] (03CR) 10Alexandros Kosiaris: [C: 031] "heh, indeed. Nice. LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/352135 (owner: 10BBlack) [14:03:00] jynus: looks like a second puppet run fixes the issue, probably a race condition [14:03:23] RECOVERY - puppet last run on db1094 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [14:03:36] interesting [14:04:34] (03PS2) 10Marostegui: Increase default screen scrollback buffer to 10000 lines [puppet] - 10https://gerrit.wikimedia.org/r/352118 [14:05:28] (03CR) 10Filippo Giunchedi: [C: 031] Strip nfs-common/rpcbind during jessie base installation [puppet] - 10https://gerrit.wikimedia.org/r/352105 (https://phabricator.wikimedia.org/T106477) (owner: 10Muehlenhoff) [14:05:44] (03CR) 10Marostegui: [C: 032] Increase default screen scrollback buffer to 10000 lines [puppet] - 10https://gerrit.wikimedia.org/r/352118 (owner: 10Marostegui) [14:05:46] (03CR) 10Alexandros Kosiaris: [C: 031] fqdn_rand cleanup: always use a seed [puppet] - 10https://gerrit.wikimedia.org/r/352136 (owner: 10BBlack) [14:07:11] (03CR) 10Alexandros Kosiaris: [C: 031] "heh, I did not expect those 2 to have strong correlation between them. LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/352137 (owner: 10BBlack) [14:10:46] 06Operations, 13Patch-For-Review, 15User-fgiunchedi: Delete non-used and/or non-requested thumbnail sizes periodically - https://phabricator.wikimedia.org/T162796#3238818 (10Gilles) OK. I'm not sure it makes sense to group them in ranges, we should be looking at widths individually. I.e. 1024 is probably ver... [14:12:08] (03CR) 10Gehel: "anything in logstash is purged after 31 days: https://github.com/wikimedia/puppet/blob/production/modules/logstash/manifests/output/elasti" [puppet] - 10https://gerrit.wikimedia.org/r/350817 (https://phabricator.wikimedia.org/T149451) (owner: 10Filippo Giunchedi) [14:13:38] * elukey hugs gehel [14:13:41] thanks! [14:13:51] let me know if you want to know more :) [14:14:24] nono I am super fine, just wanted to make sure that we indeed respected data retention [14:14:46] thanks a lot for the work [14:15:19] (03PS2) 10BBlack: fqdn_rand cleanup: fix bad hour=>23 and minute=>59 [puppet] - 10https://gerrit.wikimedia.org/r/352135 [14:15:22] (03PS2) 10BBlack: fqdn_rand cleanup: always use a seed [puppet] - 10https://gerrit.wikimedia.org/r/352136 [14:15:23] (03PS2) 10BBlack: fqdn_rand cleanup: distinct seeds in same cron [puppet] - 10https://gerrit.wikimedia.org/r/352137 [14:15:41] (03CR) 10BBlack: [V: 032 C: 032] fqdn_rand cleanup: fix bad hour=>23 and minute=>59 [puppet] - 10https://gerrit.wikimedia.org/r/352135 (owner: 10BBlack) [14:15:49] we actually already mostly respected 31 days before the patch, there was an issue if logs with timestamps older than 31 days were received (don't ask why we would receive such logs) [14:15:51] (03CR) 10BBlack: [V: 032 C: 032] fqdn_rand cleanup: always use a seed [puppet] - 10https://gerrit.wikimedia.org/r/352136 (owner: 10BBlack) [14:16:03] (03CR) 10BBlack: [V: 032 C: 032] fqdn_rand cleanup: distinct seeds in same cron [puppet] - 10https://gerrit.wikimedia.org/r/352137 (owner: 10BBlack) [14:16:32] (03CR) 10Elukey: [C: 031] Send 5xx from kafkatee to logstash [puppet] - 10https://gerrit.wikimedia.org/r/350817 (https://phabricator.wikimedia.org/T149451) (owner: 10Filippo Giunchedi) [14:16:45] (03PS2) 10BBlack: ocsp updater: 1/day instead of 1/hr [puppet] - 10https://gerrit.wikimedia.org/r/352134 (https://phabricator.wikimedia.org/T164579) [14:17:01] (03CR) 10BBlack: [V: 032 C: 032] ocsp updater: 1/day instead of 1/hr [puppet] - 10https://gerrit.wikimedia.org/r/352134 (https://phabricator.wikimedia.org/T164579) (owner: 10BBlack) [14:22:44] 06Operations, 10ops-codfw: setup naos/WMF6406 as new codfw deployment server - https://phabricator.wikimedia.org/T162900#3238848 (10Dzahn) I don't think so. Except mira needs a proper decom task with the checkbox-template for decoms on it. [14:23:29] 06Operations: Clean up wikimedia's apt repo regarding precise and lucid - https://phabricator.wikimedia.org/T164521#3238849 (10Dzahn) 05Open>03declined [14:27:57] 06Operations, 10ops-eqiad, 06Analytics-Kanban, 06DC-Ops: analytics1030 stuck in console while booting - https://phabricator.wikimedia.org/T162046#3238863 (10Cmjohnson) an1030's idrac fails to initialize, attempted reboot, drained flea power and still does not initialize. This most likely will require a n... [14:29:07] 06Operations, 10hardware-requests: Unmanaged switch for eqiad frack - https://phabricator.wikimedia.org/T164561#3238871 (10Cmjohnson) @ayounsi @robh I do not have any spare management switches. [14:30:15] !log restarting varnish frontend on cp4010 (text) for mem size update [14:30:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:38:48] 06Operations, 10hardware-requests: Unmanaged switch for eqiad frack - https://phabricator.wikimedia.org/T164561#3238899 (10RobH) Didn't frack already have a netgear switch in it, it needs two? [14:39:25] 06Operations, 06Operations-Software-Development: cumin could use randomization/splay options - https://phabricator.wikimedia.org/T164587#3238900 (10BBlack) [14:40:33] 06Operations, 10hardware-requests: Unmanaged switch for eqiad frack - https://phabricator.wikimedia.org/T164561#3238915 (10Cmjohnson) @robh yes, it will need 2. A standard msw and a frack only msw [14:41:01] !log restarting all maps+misc varnish frontends for mem sizing update (spread over the next ~1.5h) [14:41:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:41:42] 06Operations, 10hardware-requests: Unmanaged switch for eqiad frack - https://phabricator.wikimedia.org/T164561#3238920 (10RobH) Why is a standard msw needed if no non frack hosts are in frack? (Is it ONLY used for the pdu uplink?) Seems silly to have a msw sitting spare in frack and not used for anything bu... [14:42:05] cmjohnson1: Im gonna move that single task in to s4 for the order, fyi [14:42:11] i just wanted to get full info on it for approvals [14:43:48] 06Operations, 13Patch-For-Review: Race condition in setting net.netfilter.nf_conntrack_tcp_timeout_time_wait - https://phabricator.wikimedia.org/T136094#3238925 (10MoritzMuehlenhoff) 05Resolved>03Open Despite what's documented in the sysctl.d(5) manpage, this does not fix the race; kafka1018 was rebooted t... [14:44:09] 06Operations, 10ops-codfw: decom or reclaim mira - https://phabricator.wikimedia.org/T164588#3238927 (10Dzahn) [14:45:21] 06Operations, 10ops-codfw: Swap NIC on mira - https://phabricator.wikimedia.org/T162859#3177514 (10Dzahn) decom-or-reclaim task: T164588 if it becomes decom, this can be rejected. if it becomes reclaim, this should still be done. [14:45:59] 06Operations, 10hardware-requests: spare pool allocation of WMF6406 to replace mira - https://phabricator.wikimedia.org/T162897#3238952 (10Dzahn) [14:46:01] 06Operations, 10ops-codfw: setup naos/WMF6406 as new codfw deployment server - https://phabricator.wikimedia.org/T162900#3178786 (10Dzahn) 05Open>03Resolved follow-up task for mira created at T164588 closing as resolved [14:48:20] 07Puppet, 10Beta-Cluster-Infrastructure, 10Phabricator, 13Patch-For-Review: puppet failure on deployment-phab01 ... is not a Hash. It looks to be a Array at /etc/puppet/modules/phabricator/manifests/init.pp:68 - https://phabricator.wikimedia.org/T147818#3238955 (10hashar) I have unbroke deployment-phab01... [14:49:11] 07Puppet, 10Beta-Cluster-Infrastructure, 10Phabricator, 13Patch-For-Review: puppet failure on deployment-phab01: Service[ssh-phab] refuses to start - https://phabricator.wikimedia.org/T147818#3238957 (10hashar) [14:51:09] 07Puppet, 10Beta-Cluster-Infrastructure, 10Phabricator, 13Patch-For-Review: puppet failure on deployment-phab01: Service[ssh-phab] refuses to start - https://phabricator.wikimedia.org/T147818#2704482 (10hashar) [14:57:28] 06Operations, 10ops-eqiad: decommission lutetium - https://phabricator.wikimedia.org/T164398#3238989 (10Cmjohnson) [14:59:05] 06Operations, 10ops-eqiad: decommission lutetium - https://phabricator.wikimedia.org/T164398#3232158 (10Cmjohnson) 05Open>03Resolved a:03Cmjohnson Disks removed and wiped, server dns (Mgmt and Production) removed, removed from rack and racktables updated. switch ports were turned down with T164554 [14:59:13] !log nginx upgrading to 1.11.10-1+wmf1 on cache_text [14:59:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:59:38] 06Operations, 10ops-eqiad: decommission db1025 - https://phabricator.wikimedia.org/T164397#3239003 (10Cmjohnson) [14:59:45] 06Operations, 10ops-eqiad: decommission db1025 - https://phabricator.wikimedia.org/T164397#3232141 (10Cmjohnson) 05Open>03Resolved a:03Cmjohnson Disks removed and wiped, server dns (Mgmt and Production) removed, removed from rack and racktables updated. switch ports were turned down with T164554 [15:00:25] 06Operations, 10Monitoring, 10Traffic, 15User-fgiunchedi: Add node_exporter ipvs ipv6 support - https://phabricator.wikimedia.org/T160156#3239010 (10fgiunchedi) [15:03:48] (03CR) 10Filippo Giunchedi: [C: 031] "To be merged next week" [puppet] - 10https://gerrit.wikimedia.org/r/350817 (https://phabricator.wikimedia.org/T149451) (owner: 10Filippo Giunchedi) [15:04:49] !log nginx upgrading to 1.11.10-1+wmf1 on cache_upload [15:04:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:06:19] 07Puppet, 10Beta-Cluster-Infrastructure, 10Phabricator, 13Patch-For-Review: puppet failure on deployment-phab01: Service[ssh-phab] refuses to start - https://phabricator.wikimedia.org/T147818#3239033 (10Paladox) Probably because it needs a different ip as it can't bound to the same port 22 on the same ip. [15:13:37] !log increase nginx error log verbosity on mw2146 as test for T164586 [15:13:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:13:45] T164586: Pageview hourly data in Pivot is not showing up correctly - https://phabricator.wikimedia.org/T164586 [15:14:32] 06Operations, 10Wikimedia-SVG-rendering: Incorrect text positioning in SVG rasterization (scale/transform; font-size; kerning) - https://phabricator.wikimedia.org/T36947#3239079 (10Perhelion) I found something strange which intensify (or lower) the bug: # https://upload.wikimedia.org/wikipedia/commons/thumb/... [15:15:51] 06Operations, 06Operations-Software-Development: cumin could use randomization/splay options - https://phabricator.wikimedia.org/T164587#3239084 (10Volans) @BBlack Thanks for opening this feature request, because right now it's totally implementation dependent and actually I realized this is neither clear nor... [15:18:12] !log increase nginx error log verbosity on mw2146 as test for T163674 (correct task) [15:18:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:18:20] T163674: Frequent RST returned by appservers to LVS hosts - https://phabricator.wikimedia.org/T163674 [15:21:19] (03PS1) 10Marostegui: db-codfw.php: Repool db2052, depool db2045 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/352167 (https://phabricator.wikimedia.org/T162539) [15:22:21] 06Operations, 10Pybal, 10Traffic, 10netops: Frequent RST returned by appservers to LVS hosts - https://phabricator.wikimedia.org/T163674#3239118 (10elukey) I can see some interesting logs on mw2146 with error log set to info: ``` 2017/05/05 15:20:53 [info] 7794#7794: *7 client timed out (110: Connection t... [15:23:45] (03CR) 10Marostegui: [C: 032] db-codfw.php: Repool db2052, depool db2045 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/352167 (https://phabricator.wikimedia.org/T162539) (owner: 10Marostegui) [15:26:56] (03Merged) 10jenkins-bot: db-codfw.php: Repool db2052, depool db2045 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/352167 (https://phabricator.wikimedia.org/T162539) (owner: 10Marostegui) [15:27:05] (03CR) 10jenkins-bot: db-codfw.php: Repool db2052, depool db2045 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/352167 (https://phabricator.wikimedia.org/T162539) (owner: 10Marostegui) [15:27:37] (03CR) 10Andrew Bogott: "So you mean switch based on the salt version on the client? Is that kind of thing supported by puppet or will I need to hack up a bash ch" [puppet] - 10https://gerrit.wikimedia.org/r/351914 (owner: 10Andrew Bogott) [15:28:12] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Repool db2052, depool db2045 - T162539 T163548 (duration: 00m 41s) [15:28:16] (03PS1) 10Andrew Bogott: labs::lvm: Fix extend-instance-vol to work on stretch [puppet] - 10https://gerrit.wikimedia.org/r/352168 (https://phabricator.wikimedia.org/T164534) [15:28:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:28:20] !log Deploy alter table on wikidatawiki.wb_terms - db2045 - https://phabricator.wikimedia.org/T162539 https://phabricator.wikimedia.org/T163548 [15:28:21] T162539: Deploy schema change for adding term_full_entity_id column to wb_terms table - https://phabricator.wikimedia.org/T162539 [15:28:21] T163548: Drop the useless wb_terms keys "wb_terms_entity_type" and "wb_terms_type" on "wb_terms" table - https://phabricator.wikimedia.org/T163548 [15:28:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:29:01] (03CR) 10Andrew Bogott: "I HATE this and welcome alternative suggestions" [puppet] - 10https://gerrit.wikimedia.org/r/352168 (https://phabricator.wikimedia.org/T164534) (owner: 10Andrew Bogott) [15:30:31] !log running schema change on puppet.fact_values (m1) [15:30:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:37:06] 06Operations, 10Pybal, 10Traffic, 10netops: Frequent RST returned by appservers to LVS hosts - https://phabricator.wikimedia.org/T163674#3239166 (10elukey) Red herring, I found a way to reproduce the problem. I've set up `sudo tcpdump -n -v -i lo port 443` in tmux on mw2146 and ran the following requests:... [15:47:52] (03PS1) 10Cmjohnson: Adding dns entries for frlog1001 both production and mgmt T163127 [dns] - 10https://gerrit.wikimedia.org/r/352169 [15:48:05] 06Operations, 10ops-eqiad, 10DBA, 13Patch-For-Review: Reset db1070 idrac - https://phabricator.wikimedia.org/T160392#3239219 (10Marostegui) @Cmjohnson just checking if in the end you updated the idrac firmware? No pushing by any means, just checking if I need to powercycle this host next week or not. Thank... [15:48:09] (03CR) 10jerkins-bot: [V: 04-1] Adding dns entries for frlog1001 both production and mgmt T163127 [dns] - 10https://gerrit.wikimedia.org/r/352169 (owner: 10Cmjohnson) [15:50:09] (03PS2) 10Cmjohnson: Adding dns entries for frlog1001 both production and mgmt T163127 [dns] - 10https://gerrit.wikimedia.org/r/352169 [15:50:27] (03CR) 10jerkins-bot: [V: 04-1] Adding dns entries for frlog1001 both production and mgmt T163127 [dns] - 10https://gerrit.wikimedia.org/r/352169 (owner: 10Cmjohnson) [15:51:43] (03CR) 10Dzahn: ""PRE" instead of "PTR" on line 71 in 10.id-addr.arpa" (031 comment) [dns] - 10https://gerrit.wikimedia.org/r/352169 (owner: 10Cmjohnson) [15:52:20] thanks mutante just saw that [15:52:31] (03PS3) 10Cmjohnson: Adding dns entries for frlog1001 both production and mgmt T163127 [dns] - 10https://gerrit.wikimedia.org/r/352169 [15:53:31] (03CR) 10Cmjohnson: [C: 032] Adding dns entries for frlog1001 both production and mgmt T163127 [dns] - 10https://gerrit.wikimedia.org/r/352169 (owner: 10Cmjohnson) [15:56:14] 06Operations, 10ops-codfw, 10netops: codfw: ganeti2007-ganeti2008 switch power configuration - https://phabricator.wikimedia.org/T164594#3239225 (10Papaul) [15:56:34] (03PS1) 10DCausse: [WIP] Switch this repo to a deb package [software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/352170 (https://phabricator.wikimedia.org/T158560) [15:56:39] 06Operations, 10ops-codfw, 10netops: codfw: ganeti2007-ganeti2008 switch port configuration - https://phabricator.wikimedia.org/T164594#3239241 (10Papaul) [15:56:56] (03CR) 10Dzahn: "counting votes we have:" [puppet] - 10https://gerrit.wikimedia.org/r/351225 (owner: 10Dzahn) [15:57:19] (03Abandoned) 10Dzahn: puppet-lint: ignore arrow alignment [puppet] - 10https://gerrit.wikimedia.org/r/351225 (owner: 10Dzahn) [16:00:23] (03PS1) 10Jcrespo: Depool db1070 for hardware maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/352171 (https://phabricator.wikimedia.org/T160392) [16:00:55] ^I need a quick review because I am blocking chris [16:03:06] 06Operations, 10ops-codfw, 13Patch-For-Review: codfw: ganeti2007-ganeti2008 racking and onsite setup task - https://phabricator.wikimedia.org/T164011#3239252 (10Papaul) [16:03:20] (03CR) 10Jcrespo: [C: 032] Depool db1070 for hardware maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/352171 (https://phabricator.wikimedia.org/T160392) (owner: 10Jcrespo) [16:04:32] (03Merged) 10jenkins-bot: Depool db1070 for hardware maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/352171 (https://phabricator.wikimedia.org/T160392) (owner: 10Jcrespo) [16:06:07] (03CR) 10jenkins-bot: Depool db1070 for hardware maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/352171 (https://phabricator.wikimedia.org/T160392) (owner: 10Jcrespo) [16:06:43] !log jynus@tin Synchronized wmf-config/db-eqiad.php: Depool db1070 for hw maintenance (duration: 00m 39s) [16:06:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:07:46] 06Operations, 06Labs: Stretch vs. Salt - https://phabricator.wikimedia.org/T164595#3239263 (10Andrew) [16:08:58] 06Operations, 06Labs: Stretch vs. Salt - https://phabricator.wikimedia.org/T164595#3239277 (10Andrew) [16:09:57] !log shutting down db1070 for hw maintenance T160392 [16:10:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:10:06] T160392: Reset db1070 idrac - https://phabricator.wikimedia.org/T160392 [16:12:02] 06Operations, 06Labs: Stretch vs. Salt - https://phabricator.wikimedia.org/T164595#3239285 (10Andrew) [16:12:25] 06Operations, 10Phabricator, 06Release-Engineering-Team, 13Patch-For-Review: Phabricator: Make sure phabricator works properly including our puppet roles on jessie - https://phabricator.wikimedia.org/T158434#3239287 (10Paladox) This is currently working in labs. But we will see if it actually works when ph... [16:12:36] 06Operations, 10Phabricator, 06Release-Engineering-Team: Phabricator: Make sure phabricator works properly including our puppet roles on jessie - https://phabricator.wikimedia.org/T158434#3239304 (10Paladox) [16:14:12] 06Operations, 06Labs: Stretch vs. Salt - https://phabricator.wikimedia.org/T164595#3239314 (10Paladox) i wonder is it because of this part Depends: python:any (< 2.8) Depends: python:any (>= 2.7.5-5~) since Depends: python:any (>= 2.7.5-5~) does not exist on stretch. Only version there is 2.7... [16:28:04] 06Operations, 06Labs: Stretch vs. Salt - https://phabricator.wikimedia.org/T164595#3239367 (10Andrew) This seems to ultimately come down to conflicts incurred by libssl1.1's dep list [16:42:35] 06Operations, 10Phabricator, 06Release-Engineering-Team: Phabricator: Make sure phabricator works properly including our puppet roles on jessie - https://phabricator.wikimedia.org/T158434#3239429 (10Dzahn) @Paladox By "this" you mean that you can currently take a jessie labs instance, apply the same puppet r... [16:43:45] 06Operations, 10Phabricator, 06Release-Engineering-Team: Phabricator: Make sure phabricator works properly including our puppet roles on jessie - https://phabricator.wikimedia.org/T158434#3239433 (10Paladox) Ok. I haven't tested by making a new instance. But everything looks in working order. I can run syste... [16:43:53] 06Operations, 10Phabricator, 06Release-Engineering-Team: Phabricator: Make sure phabricator works properly including our puppet roles on jessie - https://phabricator.wikimedia.org/T158434#3239434 (10Paladox) 05Open>03Resolved a:03Paladox [16:45:38] 06Operations, 10Phabricator, 06Release-Engineering-Team: Phabricator: Make sure phabricator works properly including our puppet roles on jessie - https://phabricator.wikimedia.org/T158434#3239436 (10Dzahn) Well, if you **haven't tested**, then please don't claim it works, reopen and test it. [16:45:49] 06Operations, 10Phabricator, 06Release-Engineering-Team: Phabricator: Make sure phabricator works properly including our puppet roles on jessie - https://phabricator.wikimedia.org/T158434#3239437 (10Dzahn) 05Resolved>03Open [16:49:50] 06Operations, 10Phabricator, 06Release-Engineering-Team: Phabricator: Make sure phabricator works properly including our puppet roles on jessie - https://phabricator.wikimedia.org/T158434#3239443 (10Paladox) >>! In T158434#3239436, @Dzahn wrote: > Well, if you **haven't tested**, then please don't claim it w... [17:05:53] PROBLEM - mobileapps endpoints health on scb2003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [17:05:53] PROBLEM - mobileapps endpoints health on scb1004 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [17:05:53] PROBLEM - mobileapps endpoints health on scb2002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [17:05:53] PROBLEM - mobileapps endpoints health on scb2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [17:05:53] PROBLEM - mobileapps endpoints health on scb2005 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [17:05:54] PROBLEM - mobileapps endpoints health on scb1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [17:05:54] PROBLEM - mobileapps endpoints health on scb2004 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [17:05:55] PROBLEM - mobileapps endpoints health on scb2006 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [17:06:43] RECOVERY - mobileapps endpoints health on scb2003 is OK: All endpoints are healthy [17:06:51] planned ^ ? [17:06:53] RECOVERY - mobileapps endpoints health on scb2004 is OK: All endpoints are healthy [17:07:43] RECOVERY - mobileapps endpoints health on scb1001 is OK: All endpoints are healthy [17:08:43] RECOVERY - mobileapps endpoints health on scb1004 is OK: All endpoints are healthy [17:09:43] RECOVERY - mobileapps endpoints health on scb2005 is OK: All endpoints are healthy [17:09:43] RECOVERY - mobileapps endpoints health on scb2001 is OK: All endpoints are healthy [17:09:44] RECOVERY - mobileapps endpoints health on scb2006 is OK: All endpoints are healthy [17:09:44] RECOVERY - mobileapps endpoints health on scb2002 is OK: All endpoints are healthy [17:10:12] 07Puppet, 10DBA, 10Monitoring: Document performance optimization of servermon and/or puppet reporting tools - https://phabricator.wikimedia.org/T164604#3239514 (10jcrespo) [17:11:19] 07Puppet, 10DBA, 10Monitoring: Document performance optimization of servermon and/or puppet reporting tools - https://phabricator.wikimedia.org/T164604#3239534 (10jcrespo) [17:13:54] 06Operations, 10ops-eqiad, 10DBA, 13Patch-For-Review: Reset db1070 idrac - https://phabricator.wikimedia.org/T160392#3239556 (10Cmjohnson) I updated the firmware on db1070 and ipmitool is still not working, I compared the idrac settings via the gui with db1068 (ipmi works) and not differences between the t... [17:15:41] 06Operations, 06Labs: Stretch vs. Salt - https://phabricator.wikimedia.org/T164595#3239570 (10Andrew) https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=844706 [17:16:35] 07Puppet, 10DBA, 10Monitoring: Document performance optimization of servermon and/or puppet reporting tools - https://phabricator.wikimedia.org/T164604#3239573 (10jcrespo) [17:16:38] 06Operations, 06Labs: Stretch vs. Salt - https://phabricator.wikimedia.org/T164595#3239574 (10MoritzMuehlenhoff) That's for jessie->stretch updates, see https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=844706 We could either - use the salt version shipped in stretch on stretch installations (this needs to... [17:34:36] (03PS1) 10Jcrespo: Revert "Depool db1070 for hardware maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/352180 [17:42:29] 06Operations, 10ops-eqiad, 13Patch-For-Review: rack and cable frlog1001 - https://phabricator.wikimedia.org/T163127#3239678 (10Cmjohnson) frlog1001 ilom has been setup raid cfg is set to raid 1+0. labeled and racktables updated Connected pfw1/0/8 Still needs @ayounsi to configure network port [17:44:48] 06Operations, 10Phabricator, 06Release-Engineering-Team: Phabricator: Make sure phabricator works properly including our puppet roles on jessie - https://phabricator.wikimedia.org/T158434#3239693 (10mmodell) I still don't think it works 100% without a little manual intervention. It's damn close though. [17:45:02] 07Puppet, 10DBA, 10Monitoring, 07Documentation: Document performance optimization of servermon and/or puppet reporting tools - https://phabricator.wikimedia.org/T164604#3239694 (10Reedy) [17:56:40] 06Operations, 10Traffic: Merge cache_maps into cache_upload functionally - https://phabricator.wikimedia.org/T164608#3239715 (10BBlack) [17:57:04] 06Operations, 10Traffic: Merge cache_misc into cache_text functionally - https://phabricator.wikimedia.org/T164609#3239728 (10BBlack) [17:58:43] PROBLEM - Disk space on ocg1002 is CRITICAL: DISK CRITICAL - free space: / 314 MB (3% inode=62%) [18:00:25] 06Operations, 10Traffic: Unprovision cache_misc @ ulsfo - https://phabricator.wikimedia.org/T164610#3239748 (10BBlack) [18:00:43] RECOVERY - Disk space on ocg1002 is OK: DISK OK [18:00:45] 06Operations, 10Traffic: Unprovision cache_misc @ ulsfo - https://phabricator.wikimedia.org/T164610#3239764 (10BBlack) [18:00:47] 06Operations, 10ops-ulsfo, 10Traffic, 13Patch-For-Review: replace ulsfo aging servers - https://phabricator.wikimedia.org/T164327#3229950 (10BBlack) [18:05:09] 06Operations, 10ops-codfw: Swap NIC on mira - https://phabricator.wikimedia.org/T162859#3239777 (10RobH) 05stalled>03declined system is out of warranty, thanks for making the decom task (in future all reclaim/decom tasks should also have #hardware-requests), I'll handle the decom side from here out. Thanks! [18:05:51] 06Operations, 10ops-codfw, 10hardware-requests: decom mira - https://phabricator.wikimedia.org/T164588#3239779 (10RobH) p:05Triage>03Normal a:03RobH [18:07:43] PROBLEM - Disk space on ocg1002 is CRITICAL: DISK CRITICAL - free space: / 0 MB (0% inode=62%) [18:10:43] PROBLEM - Disk space on ocg1002 is CRITICAL: DISK CRITICAL - free space: / 0 MB (0% inode=62%) [18:18:43] RECOVERY - Disk space on ocg1002 is OK: DISK OK [18:21:43] !log ocg1002 - apt-get clean'ed for disk space [18:21:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:27:33] PROBLEM - Freshness of OCSP Stapling files on cp4021 is CRITICAL: CRITICAL: File /var/cache/ocsp/globalsign-2016-rsa-unified.ocsp is more than 18300 secs old! [18:28:27] 06Operations, 13Patch-For-Review, 15User-Elukey, 07Wikimedia-log-errors: Warning: timed out after 0.2 seconds when connecting to rdb1001.eqiad.wmnet [110]: Connection timed out - https://phabricator.wikimedia.org/T125735#3239815 (10Krinkle) [18:28:50] (03PS2) 10Krinkle: Re-enable persistent connection to Redis for jobrunners [mediawiki-config] - 10https://gerrit.wikimedia.org/r/351854 (https://phabricator.wikimedia.org/T125735) (owner: 10Elukey) [18:29:15] 06Operations, 13Patch-For-Review, 15User-Elukey, 07Wikimedia-log-errors: Warning: timed out after 0.2 seconds when connecting to rdb1001.eqiad.wmnet [110]: Connection timed out - https://phabricator.wikimedia.org/T125735#1996056 (10Krinkle) [18:37:33] PROBLEM - Freshness of OCSP Stapling files on cp4017 is CRITICAL: CRITICAL: File /var/cache/ocsp/digicert-2016-ecdsa-unified.ocsp is more than 18300 secs old! [18:38:57] ^ ah I changed the cron timing, and validated everything was ok on the main ssl check (which checks TLS OCSP Stapling outputs directly), but I failed to look at the separate icinga check for file freshness, which still assumes 1h cron timing... [18:39:01] I'll fix that RQ [18:39:03] PROBLEM - Freshness of OCSP Stapling files on cp1047 is CRITICAL: CRITICAL: File /var/cache/ocsp/globalsign-2016-ecdsa-unified.ocsp is more than 18300 secs old! [18:40:33] PROBLEM - Freshness of OCSP Stapling files on cp4006 is CRITICAL: CRITICAL: File /var/cache/ocsp/digicert-2016-ecdsa-unified.ocsp is more than 18300 secs old! [18:40:53] PROBLEM - Freshness of OCSP Stapling files on cp1073 is CRITICAL: CRITICAL: File /var/cache/ocsp/digicert-2016-ecdsa-unified.ocsp is more than 18300 secs old! [18:41:03] PROBLEM - Freshness of OCSP Stapling files on cp1052 is CRITICAL: CRITICAL: File /var/cache/ocsp/digicert-2016-ecdsa-unified.ocsp is more than 18300 secs old! [18:42:43] PROBLEM - Freshness of OCSP Stapling files on cp3045 is CRITICAL: CRITICAL: File /var/cache/ocsp/digicert-2016-rsa-unified.ocsp is more than 18300 secs old! [18:42:51] (03PS1) 10BBlack: Align OCSP file freshness check with new cron timing [puppet] - 10https://gerrit.wikimedia.org/r/352185 (https://phabricator.wikimedia.org/T164579) [18:44:01] (03PS2) 10BBlack: Align OCSP file freshness check with new cron timing [puppet] - 10https://gerrit.wikimedia.org/r/352185 (https://phabricator.wikimedia.org/T164579) [18:44:33] PROBLEM - Freshness of OCSP Stapling files on cp4013 is CRITICAL: CRITICAL: File /var/cache/ocsp/globalsign-2016-ecdsa-unified.ocsp is more than 18300 secs old! [18:44:41] (03CR) 10BBlack: [V: 032 C: 032] Align OCSP file freshness check with new cron timing [puppet] - 10https://gerrit.wikimedia.org/r/352185 (https://phabricator.wikimedia.org/T164579) (owner: 10BBlack) [18:46:33] PROBLEM - Freshness of OCSP Stapling files on cp4016 is CRITICAL: CRITICAL: File /var/cache/ocsp/digicert-2016-ecdsa-unified.ocsp is more than 18300 secs old! [18:46:43] PROBLEM - Freshness of OCSP Stapling files on cp3046 is CRITICAL: CRITICAL: File /var/cache/ocsp/digicert-2016-rsa-unified.ocsp is more than 18300 secs old! [18:47:53] RECOVERY - Freshness of OCSP Stapling files on cp1073 is OK: OK [18:48:03] RECOVERY - Freshness of OCSP Stapling files on cp1047 is OK: OK [18:48:03] RECOVERY - Freshness of OCSP Stapling files on cp1052 is OK: OK [18:48:33] RECOVERY - Freshness of OCSP Stapling files on cp4006 is OK: OK [18:48:33] RECOVERY - Freshness of OCSP Stapling files on cp4021 is OK: OK [18:48:33] RECOVERY - Freshness of OCSP Stapling files on cp4013 is OK: OK [18:48:33] RECOVERY - Freshness of OCSP Stapling files on cp4016 is OK: OK [18:48:33] RECOVERY - Freshness of OCSP Stapling files on cp4017 is OK: OK [18:48:43] RECOVERY - Freshness of OCSP Stapling files on cp3046 is OK: OK [18:48:45] RECOVERY - Freshness of OCSP Stapling files on cp3045 is OK: OK [18:49:54] (03CR) 10Thcipriani: [C: 031] "needed before scap release after 3.5.7" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/351548 (owner: 10Chad) [18:50:03] PROBLEM - Freshness of OCSP Stapling files on cp1008 is CRITICAL: CRITICAL: File /var/cache/ocsp/globalsign-2016-rsa-unified.ocsp is more than 18300 secs old! [18:51:34] (03CR) 10Jcrespo: [C: 032] Revert "Depool db1070 for hardware maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/352180 (owner: 10Jcrespo) [18:52:03] RECOVERY - Freshness of OCSP Stapling files on cp1008 is OK: OK [18:53:23] (03Merged) 10jenkins-bot: Revert "Depool db1070 for hardware maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/352180 (owner: 10Jcrespo) [18:53:35] (03CR) 10jenkins-bot: Revert "Depool db1070 for hardware maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/352180 (owner: 10Jcrespo) [18:54:49] !log jynus@tin Synchronized wmf-config/db-eqiad.php: Repool db1070 after maintenance (duration: 00m 40s) [18:54:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:29:21] (03PS1) 10Jcrespo: db: Comment db1015 being defective [mediawiki-config] - 10https://gerrit.wikimedia.org/r/352188 [19:34:16] 06Operations, 10DBA: Create less overhead on bacula jobs when dumping production databases - https://phabricator.wikimedia.org/T162789#3239953 (10jcrespo) It took a bit more than 11 hours to reload logically db1015 (minus cebwiki) - that is 1.3 TB (out of a total of 1.5TB for all of s3). It is now back replica... [19:50:13] RECOVERY - cassandra-c CQL 10.64.48.100:9042 on restbase1018 is OK: TCP OK - 0.036 second response time on 10.64.48.100 port 9042 [20:17:24] (03CR) 10Chad: [C: 032] Add at least a baseline scap.cfg [mediawiki-config] - 10https://gerrit.wikimedia.org/r/351548 (owner: 10Chad) [20:17:49] thcipriani: Eh, why not ^ [20:18:50] (03Merged) 10jenkins-bot: Add at least a baseline scap.cfg [mediawiki-config] - 10https://gerrit.wikimedia.org/r/351548 (owner: 10Chad) [20:18:58] (03CR) 10jenkins-bot: Add at least a baseline scap.cfg [mediawiki-config] - 10https://gerrit.wikimedia.org/r/351548 (owner: 10Chad) [20:21:02] !log demon@tin Synchronized scap/scap.cfg: no-op (duration: 00m 39s) [20:21:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:21:16] thcipriani: Sync'd, plus confirmed it doesn't break scap ^ [20:21:22] (03PS1) 10Dereckson: Revert "planet: Remove blog.wikiwix.com from fr and en feeds" [puppet] - 10https://gerrit.wikimedia.org/r/352230 [20:21:43] RainbowSprinkles: 2 for 1, nice. [20:25:50] (03CR) 10Dzahn: "ok, thank you. just want to tell them that https URLs would be much nicer and they have them, they are just broken because cert issues. le" [puppet] - 10https://gerrit.wikimedia.org/r/352230 (owner: 10Dereckson) [20:26:09] (03CR) 10Dzahn: [C: 032] Revert "planet: Remove blog.wikiwix.com from fr and en feeds" [puppet] - 10https://gerrit.wikimedia.org/r/352230 (owner: 10Dereckson) [20:29:30] (03CR) 10Dzahn: "@Johan Colliez wanna give https://letsencrypt.org/getting-started/ a try to make https work for your feed without cert issues?" [puppet] - 10https://gerrit.wikimedia.org/r/352230 (owner: 10Dereckson) [20:31:58] (03CR) 10Dereckson: "I notified by mail Johan about the HTTPS issue." [puppet] - 10https://gerrit.wikimedia.org/r/352230 (owner: 10Dereckson) [20:33:48] (03CR) 10Dzahn: "nice, thanks! i'm interested in further reducing the number of non-https URLs in planet files in general, so if you see any that can be up" [puppet] - 10https://gerrit.wikimedia.org/r/352230 (owner: 10Dereckson) [20:34:34] (03CR) 10BryanDavis: "> I HATE this and welcome alternative suggestions" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/352168 (https://phabricator.wikimedia.org/T164534) (owner: 10Andrew Bogott) [20:40:03] (03PS1) 10Urbanecm: Create Autor and Portal namespaces on Spanish Wikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/352250 (https://phabricator.wikimedia.org/T164195) [20:57:01] (03PS3) 10BearND: Limit FeaturedFeed on dewiki to last seven days [mediawiki-config] - 10https://gerrit.wikimedia.org/r/341267 (https://phabricator.wikimedia.org/T159664) [21:07:51] (03PS1) 10Dzahn: nagios_common: test contact template with private/hiera lookup [puppet] - 10https://gerrit.wikimedia.org/r/352269 [21:08:53] (03CR) 10jerkins-bot: [V: 04-1] nagios_common: test contact template with private/hiera lookup [puppet] - 10https://gerrit.wikimedia.org/r/352269 (owner: 10Dzahn) [21:10:15] (03PS2) 10Dzahn: nagios_common: test contact template with private/hiera lookup [puppet] - 10https://gerrit.wikimedia.org/r/352269 [21:17:50] (03CR) 10Dzahn: "i tested the query and it took 6.66 sec :p" [puppet] - 10https://gerrit.wikimedia.org/r/352125 (https://phabricator.wikimedia.org/T164297) (owner: 10Aklapper) [21:18:57] (03CR) 10Dzahn: [C: 031] Fix "Tasks closed" SQL query in monthly Phab metrics report email [puppet] - 10https://gerrit.wikimedia.org/r/352125 (https://phabricator.wikimedia.org/T164297) (owner: 10Aklapper) [21:19:35] 06Operations, 10Cassandra, 13Patch-For-Review, 06Services (doing): Failed disk / degraded RAID arrays: restbase1018.eqiad.wmnet - https://phabricator.wikimedia.org/T163292#3240126 (10Eevans) 05Open>03Resolved [21:34:56] 06Operations, 10Deployment-Systems, 10Scap: setup automatic deletion of old l10nupdate - https://phabricator.wikimedia.org/T130317#3240155 (10demon) 05Open>03Resolved a:03demon This is fixed via T119747 (a dupe really) [21:49:16] (03PS1) 10Dzahn: add fake icinga contacts [labs/private] - 10https://gerrit.wikimedia.org/r/352272 [21:51:04] (03PS2) 10Dzahn: add fake icinga contacts [labs/private] - 10https://gerrit.wikimedia.org/r/352272 [21:51:33] (03CR) 10Dzahn: [C: 032] "this is labs/private to make the puppet compiler work for Icinga changes" [labs/private] - 10https://gerrit.wikimedia.org/r/352272 (owner: 10Dzahn) [21:51:52] (03CR) 10Dzahn: [V: 032 C: 032] add fake icinga contacts [labs/private] - 10https://gerrit.wikimedia.org/r/352272 (owner: 10Dzahn) [21:59:03] (03Draft1) 10Paladox: Icinga: Add systemd script, also convert to base::service_unit [puppet] - 10https://gerrit.wikimedia.org/r/352274 [21:59:18] (03CR) 10Paladox: "Whoops, this was never meant to be public." [puppet] - 10https://gerrit.wikimedia.org/r/352274 (owner: 10Paladox) [21:59:20] (03Abandoned) 10Paladox: Icinga: Add systemd script, also convert to base::service_unit [puppet] - 10https://gerrit.wikimedia.org/r/352274 (owner: 10Paladox) [21:59:56] (03PS1) 10Dzahn: icinga contacts: follow-up fixes in formatting example [labs/private] - 10https://gerrit.wikimedia.org/r/352275 [22:01:07] (03CR) 10Dzahn: [V: 032 C: 032] icinga contacts: follow-up fixes in formatting example [labs/private] - 10https://gerrit.wikimedia.org/r/352275 (owner: 10Dzahn) [22:01:12] (03PS2) 10Dzahn: icinga contacts: follow-up fixes in formatting example [labs/private] - 10https://gerrit.wikimedia.org/r/352275 [22:09:07] 06Operations, 10Wikimedia-SVG-rendering: Incorrect text positioning in SVG rasterization (scale/transform; font-size; kerning) - https://phabricator.wikimedia.org/T36947#3240231 (10Glrx) I don't know if it is relevant, but the //Bosch Composition.svg// text was stroked. Stroked text can grow or shrink dependin... [22:15:11] (03Draft1) 10Paladox: base::standard_packages: Remove ubuntu precise check [puppet] - 10https://gerrit.wikimedia.org/r/352278 [22:15:14] (03PS2) 10Paladox: base::standard_packages: Remove ubuntu precise check [puppet] - 10https://gerrit.wikimedia.org/r/352278 [22:23:04] (03PS1) 10Dzahn: icinga contacts: need matching contact name in hiera for testing [labs/private] - 10https://gerrit.wikimedia.org/r/352280 [22:23:41] (03CR) 10Dzahn: [V: 032 C: 032] "number from fakenumber.org" [labs/private] - 10https://gerrit.wikimedia.org/r/352280 (owner: 10Dzahn) [22:23:45] (03PS2) 10Dzahn: icinga contacts: need matching contact name in hiera for testing [labs/private] - 10https://gerrit.wikimedia.org/r/352280 [22:41:30] (03PS2) 10Madhuvishy: sge: Fix global config handling [puppet] - 10https://gerrit.wikimedia.org/r/351379 (https://phabricator.wikimedia.org/T162955) [22:41:32] (03PS1) 10Madhuvishy: gridengine: Cleanup mergeconf script and references [puppet] - 10https://gerrit.wikimedia.org/r/352281 (https://phabricator.wikimedia.org/T162955) [22:49:45] (03Draft1) 10Paladox: Fix dzahns example [labs/private] - 10https://gerrit.wikimedia.org/r/352283 [22:49:48] (03PS2) 10Paladox: Fix dzahns example [labs/private] - 10https://gerrit.wikimedia.org/r/352283 [22:50:20] (03PS3) 10Paladox: Fix dzahns example [labs/private] - 10https://gerrit.wikimedia.org/r/352283 [22:52:46] (03PS4) 10Paladox: icinga contacts: Fix dzahns example [labs/private] - 10https://gerrit.wikimedia.org/r/352283 [22:53:25] (03CR) 10Dzahn: [V: 032 C: 032] "thanks !:)" [labs/private] - 10https://gerrit.wikimedia.org/r/352283 (owner: 10Paladox) [22:53:34] your welcome :) [22:56:06] (03CR) 10Dzahn: "well, not quite working yet http://puppet-compiler.wmflabs.org/6318/einsteinium.wikimedia.org/change.einsteinium.wikimedia.org.err" [labs/private] - 10https://gerrit.wikimedia.org/r/352283 (owner: 10Paladox) [23:55:14] (03PS1) 10Madhuvishy: gridengine: Cleanup old scripts, tracker and collector [puppet] - 10https://gerrit.wikimedia.org/r/352294 (https://phabricator.wikimedia.org/T162955)