[10:13:01] RECOVERY - puppet last run on dbproxy1002 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [10:23:14] (03PS1) 10Ema: Package new upstream release: 4.1.2-1 [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/279592 (https://phabricator.wikimedia.org/T122880) [10:25:10] (03CR) 10Ema: [C: 032 V: 032] Package new upstream release: 4.1.2-1 [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/279592 (https://phabricator.wikimedia.org/T122880) (owner: 10Ema) [10:42:22] (03PS6) 10Alexandros Kosiaris: Introducing changeprop role and puppet module [puppet] - 10https://gerrit.wikimedia.org/r/275772 (https://phabricator.wikimedia.org/T128463) (owner: 10Mobrovac) [10:48:38] (03CR) 10Hashar: "I am using my web browser web inspector to look at what the site store. The 'session_id' cookie is still set to expire after ~2h15. At le" [puppet] - 10https://gerrit.wikimedia.org/r/279186 (https://phabricator.wikimedia.org/T130621) (owner: 10Andrew Bogott) [10:50:12] (03PS7) 10Alexandros Kosiaris: Introducing changeprop role and puppet module [puppet] - 10https://gerrit.wikimedia.org/r/275772 (https://phabricator.wikimedia.org/T128463) (owner: 10Mobrovac) [10:50:47] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] "Amended to use hiera instead of the role::kafka::main::config lookup. LGTM, merging" [puppet] - 10https://gerrit.wikimedia.org/r/275772 (https://phabricator.wikimedia.org/T128463) (owner: 10Mobrovac) [10:55:33] (03PS1) 10Alexandros Kosiaris: Introduce changeprop.svc.${::site}.wmnet [dns] - 10https://gerrit.wikimedia.org/r/279594 (https://phabricator.wikimedia.org/T128463) [11:00:05] (03CR) 10Alexandros Kosiaris: [C: 032] Introduce changeprop.svc.${::site}.wmnet [dns] - 10https://gerrit.wikimedia.org/r/279594 (https://phabricator.wikimedia.org/T128463) (owner: 10Alexandros Kosiaris) [11:03:37] (03PS2) 10Volans: DB: Expose Puppet SSL certs and generate CA cert [puppet/mariadb] - 10https://gerrit.wikimedia.org/r/278042 (https://phabricator.wikimedia.org/T111654) [11:08:57] (03CR) 10Volans: "@Jcrespo: Being a submodule I'm merging it so I can test it with the puppet compiler. In case of issues I'll revert it." [puppet/mariadb] - 10https://gerrit.wikimedia.org/r/278042 (https://phabricator.wikimedia.org/T111654) (owner: 10Volans) [11:09:17] (03CR) 10Volans: [C: 032] DB: Expose Puppet SSL certs and generate CA cert [puppet/mariadb] - 10https://gerrit.wikimedia.org/r/278042 (https://phabricator.wikimedia.org/T111654) (owner: 10Volans) [11:16:16] (03PS2) 10Alexandros Kosiaris: Assign changeprop service to scb cluster [puppet] - 10https://gerrit.wikimedia.org/r/275891 (https://phabricator.wikimedia.org/T128463) (owner: 10Mobrovac) [11:17:05] (03CR) 10Alexandros Kosiaris: [C: 032] Assign changeprop service to scb cluster [puppet] - 10https://gerrit.wikimedia.org/r/275891 (https://phabricator.wikimedia.org/T128463) (owner: 10Mobrovac) [11:26:45] (03PS1) 10Volans: [WIP] DB: Expose Puppet SSL certs and generate CA cert [puppet] - 10https://gerrit.wikimedia.org/r/279596 (https://phabricator.wikimedia.org/T111654) [11:28:21] PROBLEM - puppet last run on scb1001 is CRITICAL: CRITICAL: Puppet has 1 failures [11:31:59] PROBLEM - puppet last run on scb2001 is CRITICAL: CRITICAL: Puppet has 1 failures [11:36:47] PROBLEM - changeprop endpoints health on scb1001 is CRITICAL: Generic error: Generic connection error: HTTPConnectionPool(host=10.64.0.16, port=7272): Max retries exceeded with url: /?spec (Caused by ProtocolError(Connection aborted., error(111, Connection refused))) [11:37:19] PROBLEM - changeprop endpoints health on scb2001 is CRITICAL: Generic error: Generic connection error: HTTPConnectionPool(host=10.192.32.132, port=7272): Max retries exceeded with url: /?spec (Caused by ProtocolError(Connection aborted., error(111, Connection refused))) [11:38:58] PROBLEM - puppet last run on scb2002 is CRITICAL: CRITICAL: Puppet has 1 failures [11:42:08] PROBLEM - puppet last run on scb1002 is CRITICAL: CRITICAL: Puppet has 1 failures [11:48:19] (03PS1) 10Mobrovac: service::node: Have a proper shell and home for the service user [puppet] - 10https://gerrit.wikimedia.org/r/279597 [11:49:26] (03CR) 10jenkins-bot: [V: 04-1] service::node: Have a proper shell and home for the service user [puppet] - 10https://gerrit.wikimedia.org/r/279597 (owner: 10Mobrovac) [11:50:17] (03PS2) 10Mobrovac: service::node: Have a proper shell and home for the service user [puppet] - 10https://gerrit.wikimedia.org/r/279597 [11:51:29] ACKNOWLEDGEMENT - changeprop endpoints health on scb1001 is CRITICAL: Generic error: Generic connection error: HTTPConnectionPool(host=10.64.0.16, port=7272): Max retries exceeded with url: /?spec (Caused by ProtocolError(Connection aborted., error(111, Connection refused))) Marko Obrovac first deployment of change propagation under way [11:51:30] ACKNOWLEDGEMENT - puppet last run on scb1001 is CRITICAL: CRITICAL: Puppet has 1 failures Marko Obrovac first deployment of change propagation under way [11:53:10] (03CR) 10Mobrovac: "Looking good - https://puppet-compiler.wmflabs.org/2165/ ." [puppet] - 10https://gerrit.wikimedia.org/r/279597 (owner: 10Mobrovac) [11:55:32] (03PS9) 10Mobrovac: Kafka config: Add config functions [puppet] - 10https://gerrit.wikimedia.org/r/279280 (https://phabricator.wikimedia.org/T130371) [12:04:47] PROBLEM - changeprop endpoints health on scb2002 is CRITICAL: Generic error: Generic connection error: HTTPConnectionPool(host=10.192.48.43, port=7272): Max retries exceeded with url: /?spec (Caused by ProtocolError(Connection aborted., error(111, Connection refused))) [12:07:46] PROBLEM - changeprop endpoints health on scb1002 is CRITICAL: Generic error: Generic connection error: HTTPConnectionPool(host=10.64.16.21, port=7272): Max retries exceeded with url: /?spec (Caused by ProtocolError(Connection aborted., error(111, Connection refused))) [12:13:00] (03PS1) 10Ema: pbuilderrc: set GIT_PBUILDER_OUTPUT_DIR [puppet] - 10https://gerrit.wikimedia.org/r/279600 [12:34:12] 6Operations, 10DBA, 13Patch-For-Review: implement performance_schema for mysql monitoring - https://phabricator.wikimedia.org/T99485#1292599 (10Nemo_bis) > jcrespo moved this task to Next on the DBA workboard. Why is this "Next" when T69223 isn't? [12:38:55] (03Abandoned) 10Ema: pbuilderrc: set GIT_PBUILDER_OUTPUT_DIR [puppet] - 10https://gerrit.wikimedia.org/r/279600 (owner: 10Ema) [12:54:36] (03PS1) 10Ema: Varnish 4 APT pinning: include varnish-doc [puppet] - 10https://gerrit.wikimedia.org/r/279603 [12:57:24] (03CR) 10Ema: [C: 032 V: 032] Varnish 4 APT pinning: include varnish-doc [puppet] - 10https://gerrit.wikimedia.org/r/279603 (owner: 10Ema) [13:00:44] (03CR) 10Fjalapeno: "Dereckson - we are content to just let this one roll out with the normal deployment schedule, unless SWAT is required for this patch. Let " [mediawiki-config] - 10https://gerrit.wikimedia.org/r/279394 (https://phabricator.wikimedia.org/T128795) (owner: 10Fjalapeno) [13:10:02] (03CR) 10Dereckson: "For the configuration, there isn't really any "releases" schedule: config changes are deployed one per one, or by group of similar changes" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/279394 (https://phabricator.wikimedia.org/T128795) (owner: 10Fjalapeno) [13:11:00] (03CR) 10Fjalapeno: "Dereckson - got it. Will put this on the SWAT schedule for next week. Thanks for the help!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/279394 (https://phabricator.wikimedia.org/T128795) (owner: 10Fjalapeno) [13:12:41] (03PS3) 10BBlack: Add ferm rules for DNS auth servers [puppet] - 10https://gerrit.wikimedia.org/r/277258 (owner: 10Muehlenhoff) [13:14:15] (03CR) 10BBlack: [C: 032] Add ferm rules for DNS auth servers [puppet] - 10https://gerrit.wikimedia.org/r/277258 (owner: 10Muehlenhoff) [13:15:03] !log enabling ferm on authdns servers (one at a time, while watching stuff...) [13:15:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:15:26] PROBLEM - Host mw2027 is DOWN: PING CRITICAL - Packet loss = 100% [13:16:46] RECOVERY - Host mw2027 is UP: PING OK - Packet loss = 0%, RTA = 37.02 ms [13:19:43] 6Operations, 10ops-eqiad: upgrade package_builder machine with SSD - https://phabricator.wikimedia.org/T130759#2150904 (10Cmjohnson) We have more than plenty to spare. I currently have (22) Intel 320 Series SSDSA2CW300G3 2.5" on-site. [13:23:34] (03PS1) 10BBlack: Add base::firewall to role::authdns [puppet] - 10https://gerrit.wikimedia.org/r/279609 [13:27:20] (03CR) 10BBlack: [C: 032] Add base::firewall to role::authdns [puppet] - 10https://gerrit.wikimedia.org/r/279609 (owner: 10BBlack) [13:34:40] PROBLEM - puppet last run on californium is CRITICAL: CRITICAL: Puppet last ran 1 day ago [13:36:28] RECOVERY - puppet last run on californium is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [13:48:41] 6Operations, 10ops-eqiad: db1067 degraded RAID - https://phabricator.wikimedia.org/T130517#2150941 (10Cmjohnson) Swapped disk and is rebuilding cmjohnson@db1067:~$ sudo megacli -PDList -aALL |grep "Firmware state:" Firmware state: Online, Spun Up Firmware state: Rebuild Firmware state: Online, Spun Up Firmwar... [13:58:26] (03PS10) 10Mobrovac: Kafka config: Add config functions [puppet] - 10https://gerrit.wikimedia.org/r/279280 (https://phabricator.wikimedia.org/T130371) [13:59:39] (03CR) 10jenkins-bot: [V: 04-1] Kafka config: Add config functions [puppet] - 10https://gerrit.wikimedia.org/r/279280 (https://phabricator.wikimedia.org/T130371) (owner: 10Mobrovac) [14:03:07] (03PS11) 10Mobrovac: Kafka config: Add config functions [puppet] - 10https://gerrit.wikimedia.org/r/279280 (https://phabricator.wikimedia.org/T130371) [14:05:59] PROBLEM - puppet last run on eeden is CRITICAL: Timeout while attempting connection [14:07:30] RECOVERY - puppet last run on eeden is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:13:53] (03PS12) 10Mobrovac: Kafka config: Add config functions [puppet] - 10https://gerrit.wikimedia.org/r/279280 (https://phabricator.wikimedia.org/T130371) [14:16:31] RECOVERY - Router interfaces on cr1-codfw is OK: OK: host 208.80.153.192, interfaces up: 122, down: 0, dormant: 0, excluded: 0, unused: 0 [14:17:19] RECOVERY - Router interfaces on cr2-ulsfo is OK: OK: host 198.35.26.193, interfaces up: 77, down: 0, dormant: 0, excluded: 0, unused: 0 [14:20:40] !log all authdns servers running ferm rules now [14:20:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:20:49] PROBLEM - Ulsfo HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0] [14:22:09] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] [14:27:51] RECOVERY - Ulsfo HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [14:29:19] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [14:38:04] !log depool restbase and drain cassandra from restbase1007 [14:38:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:41:59] (03PS2) 10BBlack: tlsproxy: don't clobber server header from backend [puppet] - 10https://gerrit.wikimedia.org/r/279391 (owner: 10Ori.livneh) [14:42:24] (03CR) 10BBlack: [C: 032 V: 032] "Tested on pinkunicorn, works as advertised" [puppet] - 10https://gerrit.wikimedia.org/r/279391 (owner: 10Ori.livneh) [14:43:20] !log chown _graphite:_graphite frontend.navtiming.loadEventEnd [14:43:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:43:46] ori: see ^, frontend.navtiming.loadEventEnd didn't have the right permissions [14:48:19] PROBLEM - mobileapps endpoints health on scb1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [14:48:49] PROBLEM - mobileapps endpoints health on scb1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [14:49:22] ^ saw this /cc bearND|afk [14:51:09] mdholloway: ty [14:51:54] (03CR) 10Chad: "Can I get a +1 on this from someone else who can verify? I'll gladly merge if so." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/233900 (owner: 10Awight) [14:51:56] (03PS2) 10Alexandros Kosiaris: ores::base: Remove data_path [puppet] - 10https://gerrit.wikimedia.org/r/278950 [14:52:05] (03PS13) 10Mobrovac: Kafka config: Add config functions [puppet] - 10https://gerrit.wikimedia.org/r/279280 (https://phabricator.wikimedia.org/T130371) [14:52:46] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] ores::base: Remove data_path [puppet] - 10https://gerrit.wikimedia.org/r/278950 (owner: 10Alexandros Kosiaris) [14:53:11] hm, bearND, mdholloway that must have been a temp spike [14:53:12] (03PS2) 10Alexandros Kosiaris: ores::base: Remove handling of /srv [puppet] - 10https://gerrit.wikimedia.org/r/278951 [14:53:47] (03CR) 10jenkins-bot: [V: 04-1] ores::base: Remove handling of /srv [puppet] - 10https://gerrit.wikimedia.org/r/278951 (owner: 10Alexandros Kosiaris) [14:54:19] (03CR) 10Halfak: [C: 031] Add role::ores::web and roles::ores::worker [puppet] - 10https://gerrit.wikimedia.org/r/278989 (https://phabricator.wikimedia.org/T124201) (owner: 10Alexandros Kosiaris) [14:55:00] (03PS1) 10BryanDavis: Logging: Remove sampling from ApiAction kafka channel [mediawiki-config] - 10https://gerrit.wikimedia.org/r/279614 (https://phabricator.wikimedia.org/T108618) [14:55:08] (03CR) 10Chad: "Still needed?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/271028 (owner: 10CSteipp) [14:55:20] mobrovac: was that because of "depool restbase and drain cassandra"? //cc:mdholloway [14:55:46] not likely, no [14:56:29] (03PS3) 10Chad: Document db-codfw readOnlyBySection [mediawiki-config] - 10https://gerrit.wikimedia.org/r/266534 (owner: 10Dereckson) [14:56:49] 6Operations, 10Ops-Access-Requests, 13Patch-For-Review, 15User-greg: Requesting access to production for SWAT deploy for dereckson - https://phabricator.wikimedia.org/T129365#2151126 (10Dzahn) [14:56:59] (03CR) 10Chad: [C: 032] Document db-codfw readOnlyBySection [mediawiki-config] - 10https://gerrit.wikimedia.org/r/266534 (owner: 10Dereckson) [14:57:39] (03Merged) 10jenkins-bot: Document db-codfw readOnlyBySection [mediawiki-config] - 10https://gerrit.wikimedia.org/r/266534 (owner: 10Dereckson) [14:57:53] (03CR) 10BryanDavis: "I don't think it is. This log channel is actually used to populate graphite metrics that power the https://grafana.wikimedia.org/dashboard" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/271028 (owner: 10CSteipp) [14:58:32] bearND: the charts aren't revealing anything interesting - http://ganglia.wikimedia.org/latest/?r=hour&cs=&ce=&c=Service+Cluster+B+eqiad&h=scb1001.eqiad.wmnet&tab=m&vn=&hide-hf=false&m=cpu_report&sh=1&z=small&hc=4&host_regex=&max_graphs=0&s=by+name [14:58:40] 6Operations, 10ops-eqiad, 10RESTBase-Cassandra: restbase1007.eqiad.wmnet CPU temperature? - https://phabricator.wikimedia.org/T130370#2151132 (10Cmjohnson) Re-applied thermal paste. Let's wait the weekend before closing the task. [14:58:45] !log demon@tin Synchronized wmf-config/db-codfw.php: docfix (duration: 00m 34s) [14:58:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:59:16] ostriches: what do you think about pushing this change today? -- https://gerrit.wikimedia.org/r/#/c/279614/ [15:00:04] (03PS2) 10Cmjohnson: Removing dns entries for mw1026-69 per bug task# t129060 [dns] - 10https://gerrit.wikimedia.org/r/279449 [15:00:08] bd808: Sure [15:00:16] sweet [15:01:09] (03CR) 10Chad: [C: 032] Logging: Remove sampling from ApiAction kafka channel [mediawiki-config] - 10https://gerrit.wikimedia.org/r/279614 (https://phabricator.wikimedia.org/T108618) (owner: 10BryanDavis) [15:01:42] (03Merged) 10jenkins-bot: Logging: Remove sampling from ApiAction kafka channel [mediawiki-config] - 10https://gerrit.wikimedia.org/r/279614 (https://phabricator.wikimedia.org/T108618) (owner: 10BryanDavis) [15:01:45] (03CR) 10Cmjohnson: [C: 032] Removing dns entries for mw1026-69 per bug task# t129060 [dns] - 10https://gerrit.wikimedia.org/r/279449 (owner: 10Cmjohnson) [15:02:31] 6Operations, 6Labs, 10wikitech.wikimedia.org: decom old wikitech-static machine - https://phabricator.wikimedia.org/T129391#2151149 (10Dzahn) a:3Dzahn [15:03:15] 6Operations, 10ops-eqiad, 6DC-Ops, 13Patch-For-Review: mw1026-69 are shut down and should be physically decommissioned - https://phabricator.wikimedia.org/T129060#2151150 (10Cmjohnson) [15:03:17] 6Operations, 10ops-eqiad, 6DC-Ops, 13Patch-For-Review: Decommission mw1037 - https://phabricator.wikimedia.org/T126350#2151151 (10Cmjohnson) [15:03:51] !log demon@tin Synchronized wmf-config/InitialiseSettings.php: Remove sampling from ApiAction kafka channel (duration: 00m 33s) [15:03:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:03:56] bd808: ^^^ [15:04:23] thanks [15:04:35] 6Operations, 6Labs, 10wikitech.wikimedia.org: decom old wikitech-static machine - https://phabricator.wikimedia.org/T129391#2151154 (10Dzahn) @Krenair you think we can go ahead here? [15:05:38] 6Operations, 10Ops-Access-Requests: global root access for gilles - https://phabricator.wikimedia.org/T130910#2151157 (10Dzahn) [15:08:03] (03Abandoned) 10BBlack: cache_upload: remove support for If-Cached [puppet] - 10https://gerrit.wikimedia.org/r/279544 (owner: 10BBlack) [15:09:43] bd808: It's a beautiful morning so I was combing for easy swat type stuff :) [15:10:13] I kind of figured. Glad I could help :) [15:13:32] 6Operations, 10ops-eqiad, 6DC-Ops: testing: r430 server / h800 controller / md1200 shelf - https://phabricator.wikimedia.org/T127490#2151169 (10Cmjohnson) @chasemp Verdict is no go! [15:14:09] (03PS14) 10Mobrovac: Kafka config: Add config functions [puppet] - 10https://gerrit.wikimedia.org/r/279280 (https://phabricator.wikimedia.org/T130371) [15:15:09] cmjohnson1: I'm looking into restbase1007 not coming back up btw [15:17:01] 6Operations, 10ops-eqiad, 6DC-Ops: testing: r430 server / h800 controller / md1200 shelf - https://phabricator.wikimedia.org/T127490#2151173 (10Cmjohnson) 5Open>3Resolved 1. We would have to use a non-Dell branded low-profile RAID card 2. The LSI card i did attempt to use did was not recognized by the se... [15:17:03] 6Operations, 10hardware-requests: new labstore hardware for eqiad - https://phabricator.wikimedia.org/T126089#2151176 (10Cmjohnson) [15:17:09] godog: hrm...okay [15:17:16] let me look at console [15:19:42] godog: looks up to me [15:19:47] I've assembled the raid arrays manually from the initramfs, not sure why it did that [15:19:48] RECOVERY - mobileapps endpoints health on scb1002 is OK: All endpoints are healthy [15:20:15] yeah..weird [15:21:08] RECOVERY - mobileapps endpoints health on scb1001 is OK: All endpoints are healthy [15:21:57] cmjohnson1: anyways up again now, but something we'll need to investigate [15:22:08] just ran the health checker on scb1001 ^ [15:22:35] godog: may want to create a separate task to investigate [15:23:48] cmjohnson1: indeed, doing so now [15:27:04] 6Operations: restbase1007 not assembling raid after reboot - https://phabricator.wikimedia.org/T130930#2151192 (10fgiunchedi) [15:28:39] (03CR) 10Chad: [C: 032] Remove T44894 FIXME note [mediawiki-config] - 10https://gerrit.wikimedia.org/r/279140 (owner: 10Dereckson) [15:29:12] (03Merged) 10jenkins-bot: Remove T44894 FIXME note [mediawiki-config] - 10https://gerrit.wikimedia.org/r/279140 (owner: 10Dereckson) [15:31:32] 6Operations, 10ops-eqiad, 10RESTBase-Cassandra: restbase1007.eqiad.wmnet CPU temperature? - https://phabricator.wikimedia.org/T130370#2151212 (10fgiunchedi) >>! In T130370#2147842, @Cmjohnson wrote: > I will need to power this server off and re-apply thermal paste. LMK a good time to do this. Approx downtime... [15:32:00] !log demon@tin Synchronized wmf-config/InitialiseSettings.php: Remove T44894 FIXME note (duration: 00m 27s) [15:32:01] T44894: Please restrict anonymous users from creating new pages at sw.wikipedia - https://phabricator.wikimedia.org/T44894 [15:32:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:38:20] godog: d'oh! thanks [15:39:16] (03PS3) 10Ladsgroup: ores::base: Remove handling of /srv [puppet] - 10https://gerrit.wikimedia.org/r/278951 (owner: 10Alexandros Kosiaris) [15:39:20] ori: np, when it happens it gets reported by icinga too if you see it again [15:39:47] CRITICAL: CRITICAL: ori did something stupid. [15:39:58] (03PS3) 10Mobrovac: service::node: Let Scap3 manage the user by default [puppet] - 10https://gerrit.wikimedia.org/r/279597 [15:40:51] heheh [15:42:41] (03CR) 10Mobrovac: [C: 031] "Finally looking good for the compiler - https://puppet-compiler.wmflabs.org/2172/" [puppet] - 10https://gerrit.wikimedia.org/r/279280 (https://phabricator.wikimedia.org/T130371) (owner: 10Mobrovac) [15:43:51] (03CR) 10Ladsgroup: "PS3 is rebase only" [puppet] - 10https://gerrit.wikimedia.org/r/278951 (owner: 10Alexandros Kosiaris) [15:46:56] (03PS1) 10Ema: Install VMODs on Varnish 4 instances [puppet] - 10https://gerrit.wikimedia.org/r/279617 (https://phabricator.wikimedia.org/T124281) [15:47:12] (03CR) 10Mobrovac: "OKed by the compiler - https://puppet-compiler.wmflabs.org/2173/" [puppet] - 10https://gerrit.wikimedia.org/r/279597 (owner: 10Mobrovac) [15:54:18] (03PS4) 10Alexandros Kosiaris: ores::base: Remove handling of /srv [puppet] - 10https://gerrit.wikimedia.org/r/278951 [15:54:50] (03CR) 10jenkins-bot: [V: 04-1] ores::base: Remove handling of /srv [puppet] - 10https://gerrit.wikimedia.org/r/278951 (owner: 10Alexandros Kosiaris) [16:01:50] (03PS4) 10Mobrovac: service::node: Let Scap3 manage the user by default [puppet] - 10https://gerrit.wikimedia.org/r/279597 [16:03:51] 6Operations, 10MobileFrontend, 10Traffic, 3Reading-Web-Sprint-68-"Java and JavaScript are basically the same", and 4 others: Incorrect TOC and section edit links rendering in Vector due to ParserCache corruption via ParserOutput::setText( ParserOutput::getT... - https://phabricator.wikimedia.org/T124356#2151236 [16:06:37] 6Operations, 10Analytics-Cluster, 10hardware-requests: setup/deploy server analytics1003/WMF4541 - https://phabricator.wikimedia.org/T130840#2151253 (10RobH) Ok, Brandon rolled back all my changes (since I only have rollback 0). I'm reattempting them more carefully in a single changeset. ``` [edit interf... [16:10:44] (03PS5) 10Mobrovac: service::node: Let Scap3 manage the user by default [puppet] - 10https://gerrit.wikimedia.org/r/279597 [16:11:15] (03CR) 10Alexandros Kosiaris: "curious, did you try something like ?" [puppet] - 10https://gerrit.wikimedia.org/r/279600 (owner: 10Ema) [16:12:31] 6Operations, 10ops-eqiad: upgrade package_builder machine with SSD - https://phabricator.wikimedia.org/T130759#2151300 (10akosiaris) Sounds like a plan to me then ;-) [16:14:03] (03CR) 10Mobrovac: "PCC says go go go - https://puppet-compiler.wmflabs.org/2175/" [puppet] - 10https://gerrit.wikimedia.org/r/279597 (owner: 10Mobrovac) [16:15:30] (03PS1) 10Ori.livneh: Update logging config for request IDs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/279621 [16:17:27] (03PS2) 10Andrew Bogott: Update jessie manifest for modern versions of bootstrap-vz [puppet] - 10https://gerrit.wikimedia.org/r/279463 [16:17:29] (03PS1) 10Andrew Bogott: Designate: Raise default domain quota [puppet] - 10https://gerrit.wikimedia.org/r/279622 [16:17:34] (03CR) 10Alexandros Kosiaris: [C: 032] service::node: Let Scap3 manage the user by default [puppet] - 10https://gerrit.wikimedia.org/r/279597 (owner: 10Mobrovac) [16:17:40] (03PS6) 10Alexandros Kosiaris: service::node: Let Scap3 manage the user by default [puppet] - 10https://gerrit.wikimedia.org/r/279597 (owner: 10Mobrovac) [16:18:32] (03CR) 10Alexandros Kosiaris: [V: 032] service::node: Let Scap3 manage the user by default [puppet] - 10https://gerrit.wikimedia.org/r/279597 (owner: 10Mobrovac) [16:19:24] (03PS3) 10Andrew Bogott: Update jessie manifest for modern versions of bootstrap-vz [puppet] - 10https://gerrit.wikimedia.org/r/279463 [16:19:58] (03PS2) 10Ori.livneh: Update logging config for request IDs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/279621 [16:20:11] (03CR) 10Mobrovac: [C: 04-1] "We did some changes over in I01d291cd729ba012dbcf2e831ac50632ab394add , so this will need to be reworked" [puppet] - 10https://gerrit.wikimedia.org/r/279415 (owner: 10Mobrovac) [16:23:11] 6Operations, 10Analytics-Cluster, 10hardware-requests: setup/deploy server analytics1003/WMF4541 - https://phabricator.wikimedia.org/T130840#2151348 (10RobH) The odd part is those don't look like they should be tftp calls, but they only trigger in the log when I attempt to pxe boot. [16:24:04] (03CR) 10Andrew Bogott: [C: 032] Update jessie manifest for modern versions of bootstrap-vz [puppet] - 10https://gerrit.wikimedia.org/r/279463 (owner: 10Andrew Bogott) [16:25:11] (03PS1) 10Mobrovac: scap::target: Add the $dpeloy)user group ID [puppet] - 10https://gerrit.wikimedia.org/r/279625 [16:25:38] PROBLEM - puppet last run on aqs1001 is CRITICAL: CRITICAL: Puppet has 1 failures [16:25:47] (03PS2) 10Mobrovac: scap::target: Add the $dpeloy_user group ID [puppet] - 10https://gerrit.wikimedia.org/r/279625 [16:28:38] PROBLEM - puppet last run on restbase1012 is CRITICAL: CRITICAL: Puppet has 1 failures [16:29:22] (03CR) 10Mobrovac: "OKed by the compiler - https://puppet-compiler.wmflabs.org/2176/" [puppet] - 10https://gerrit.wikimedia.org/r/279625 (owner: 10Mobrovac) [16:29:24] 6Operations, 10Analytics-Cluster, 10hardware-requests: setup/deploy server analytics1003/WMF4541 - https://phabricator.wikimedia.org/T130840#2151364 (10RobH) So, despite having confirmed this was likely background noise with @dzahn yesterday, it's continued occurrence made me re-investigate. The iptables dr... [16:29:54] 6Operations, 10MediaWiki-Interface, 10Traffic: Purge pages cached with mobile editlinks - https://phabricator.wikimedia.org/T125841#2151367 (10BBlack) With the parent resolved, do we need to look at wiping out any lingering corner-cases that are still cached in varnish? [16:30:19] PROBLEM - puppet last run on restbase1007 is CRITICAL: CRITICAL: Puppet has 1 failures [16:31:08] PROBLEM - puppet last run on aqs1002 is CRITICAL: CRITICAL: Puppet has 1 failures [16:35:47] akosiaris: puppet failures because of our latest change ^^^ [16:35:51] akosiaris: usermod failing [16:36:43] akosiaris: i'll revert the small change in service::node i added there (for the home dir) [16:36:49] 6Operations, 10ops-codfw, 6DC-Ops: Check bast2001 for hardware problems - https://phabricator.wikimedia.org/T129316#2151376 (10Dzahn) [16:36:51] 6Operations: Reinstall bast1001 with jessie - https://phabricator.wikimedia.org/T123721#2151375 (10Dzahn) [16:36:53] 6Operations, 13Patch-For-Review: reinstall bast2001 with jessie - https://phabricator.wikimedia.org/T128899#2151377 (10Dzahn) [16:37:28] 6Operations: Reinstall bast1001 with jessie - https://phabricator.wikimedia.org/T123721#1936438 (10Dzahn) settings as blocked by bast2001 install / hardware issues. we don't want both bastions to have a potential problem at the same time [16:38:42] (03PS1) 10Mobrovac: service::node: Bring back the user's home value to undef [puppet] - 10https://gerrit.wikimedia.org/r/279629 [16:38:46] akosiaris: ^^^ [16:39:58] PROBLEM - puppet last run on restbase1013 is CRITICAL: CRITICAL: Puppet has 1 failures [16:41:03] (03CR) 10Yuvipanda: [C: 031] "LGTM in theory, needs testing. Also add an explanatory comment around the create_resource :)" [puppet] - 10https://gerrit.wikimedia.org/r/279198 (owner: 1020after4) [16:45:08] PROBLEM - puppet last run on restbase1010 is CRITICAL: CRITICAL: Puppet has 1 failures [16:45:16] (03CR) 10Mobrovac: Hieraize keyholder::agent configuration (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/279198 (owner: 1020after4) [16:47:08] PROBLEM - puppet last run on aqs1003 is CRITICAL: CRITICAL: Puppet has 1 failures [16:47:29] PROBLEM - puppet last run on restbase1011 is CRITICAL: CRITICAL: Puppet has 1 failures [16:47:51] (03CR) 10BryanDavis: [C: 031] Update logging config for request IDs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/279621 (owner: 10Ori.livneh) [16:50:35] !log ori@tin Synchronized php-1.27.0-wmf.18/includes: Iaf90c20c330e: Provide a unique request identifier (duration: 01m 22s) [16:50:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:51:23] (03CR) 10Ori.livneh: [C: 032] Update logging config for request IDs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/279621 (owner: 10Ori.livneh) [16:51:50] (03Merged) 10jenkins-bot: Update logging config for request IDs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/279621 (owner: 10Ori.livneh) [16:54:10] bd808: woot: 2016-03-25 16:53:56 [b4c935a3cf2ad2996c1af2e5] tin enwiki 1.27.0-wmf.18 AdHocDebug INFO: ori test [16:54:16] nice [16:55:05] !log ori@tin Synchronized wmf-config/logging.php: Ia2cd5daaf3: Update logging config for request IDs (duration: 00m 31s) [16:55:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:56:19] (03PS1) 10Ori.livneh: Include request ID in profiling data [mediawiki-config] - 10https://gerrit.wikimedia.org/r/279631 [16:57:50] (03PS1) 10Dzahn: site.pp - introduce wasat, backup home dirs [puppet] - 10https://gerrit.wikimedia.org/r/279632 (https://phabricator.wikimedia.org/T129930) [16:58:55] (03CR) 10jenkins-bot: [V: 04-1] site.pp - introduce wasat, backup home dirs [puppet] - 10https://gerrit.wikimedia.org/r/279632 (https://phabricator.wikimedia.org/T129930) (owner: 10Dzahn) [16:59:01] (03CR) 10Dzahn: [C: 04-2] site.pp - introduce wasat, backup home dirs [puppet] - 10https://gerrit.wikimedia.org/r/279632 (https://phabricator.wikimedia.org/T129930) (owner: 10Dzahn) [16:59:18] (03PS2) 10Ori.livneh: Include request ID in profiling data [mediawiki-config] - 10https://gerrit.wikimedia.org/r/279631 [17:02:39] bd808: is there a way to specify a logstash query via URL parameters? [17:02:50] (03PS2) 10Dzahn: site.pp - introduce wasat, backup home dirs [puppet] - 10https://gerrit.wikimedia.org/r/279632 (https://phabricator.wikimedia.org/T129930) [17:03:08] so that we can generate a link to "logstash.wikimedia.org/?reqId=" + mw.config.get('wgRequestId') ? [17:03:33] hmmm... I don't think that kabana knows how to do that [17:03:50] (03CR) 10jenkins-bot: [V: 04-1] site.pp - introduce wasat, backup home dirs [puppet] - 10https://gerrit.wikimedia.org/r/279632 (https://phabricator.wikimedia.org/T129930) (owner: 10Dzahn) [17:04:07] oh looks like you can, sorta: https://github.com/elastic/kibana/issues/168#issuecomment-23811995 [17:06:47] (03PS3) 10Dzahn: site.pp - introduce wasat, backup home dirs [puppet] - 10https://gerrit.wikimedia.org/r/279632 (https://phabricator.wikimedia.org/T129930) [17:08:35] (03CR) 10Dzahn: [C: 032] "wazzat? it's wasat! what's wasat? it's codfw's terbium. ah, gotcha" [puppet] - 10https://gerrit.wikimedia.org/r/279632 (https://phabricator.wikimedia.org/T129930) (owner: 10Dzahn) [17:08:39] (03PS2) 10Andrew Bogott: Designate: Raise default domain quota [puppet] - 10https://gerrit.wikimedia.org/r/279622 [17:10:38] (03CR) 10Andrew Bogott: [C: 032] Designate: Raise default domain quota [puppet] - 10https://gerrit.wikimedia.org/r/279622 (owner: 10Andrew Bogott) [17:12:18] (03PS4) 10BBlack: varnish: Fix puppet in deployment-prep [puppet] - 10https://gerrit.wikimedia.org/r/277058 (https://phabricator.wikimedia.org/T129270) (owner: 10Alex Monk) [17:13:00] RECOVERY - puppet last run on aqs1001 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [17:13:01] 6Operations, 10hardware-requests, 5codfw-rollout, 3codfw-rollout-Jan-Mar-2016: MediaWiki maintenance host for codfw (terbium's equivalent) - https://phabricator.wikimedia.org/T126987#2151439 (10Dzahn) [17:14:59] PROBLEM - puppet last run on strontium is CRITICAL: CRITICAL: puppet fail [17:15:04] 6Operations, 7Puppet, 10Beta-Cluster-Infrastructure, 10Traffic, 13Patch-For-Review: Fix puppet on deployment-cache* hosts in beta cluster - https://phabricator.wikimedia.org/T129270#2151441 (10BBlack) @Krenair - I've modified the patch substantially, can you re-check on deployment-prep that it solves the... [17:15:08] RECOVERY - puppet last run on aqs1003 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [17:15:33] (03Abandoned) 10Mobrovac: service::node: Bring back the user's home value to undef [puppet] - 10https://gerrit.wikimedia.org/r/279629 (owner: 10Mobrovac) [17:16:02] matanya: do you remember why python-mysqldb must be on terbium? git blame shows you once added it [17:16:16] (03CR) 10BBlack: [C: 031] Install VMODs on Varnish 4 instances [puppet] - 10https://gerrit.wikimedia.org/r/279617 (https://phabricator.wikimedia.org/T124281) (owner: 10Ema) [17:16:38] RECOVERY - puppet last run on aqs1002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:16:48] matanya: ah! https://gerrit.wikimedia.org/r/#/c/143831/ i wonder how "temporary" it is now [17:18:04] "install the python-mysqldb[1] package onterbium? I would like to use it in a script to collect statistics forour "SUL audit"[2] and for future tasks related to SUL finalization." [17:18:16] that's all long time ago [17:18:58] RECOVERY - puppet last run on restbase1011 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [17:19:02] !log rolling restart of restbase nodes to apply https://gerrit.wikimedia.org/r/#/c/279597/ [17:19:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:20:09] RECOVERY - puppet last run on strontium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:21:29] 6Operations, 10ops-requests: Install python-mysqldb on terbium - https://phabricator.wikimedia.org/T84075#2151461 (10Dzahn) [17:21:58] (03PS9) 10Alex Monk: openstack: Add proxy panel files [puppet] - 10https://gerrit.wikimedia.org/r/278871 (https://phabricator.wikimedia.org/T129245) [17:22:19] (03PS2) 10Alex Monk: Proxydashboard: Move proxy panel into the 'dns' group [puppet] - 10https://gerrit.wikimedia.org/r/279580 (owner: 10Andrew Bogott) [17:22:24] (03CR) 10Alex Monk: [C: 031] Proxydashboard: Move proxy panel into the 'dns' group [puppet] - 10https://gerrit.wikimedia.org/r/279580 (owner: 10Andrew Bogott) [17:22:58] RECOVERY - puppet last run on restbase1012 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [17:23:05] 6Operations, 10ops-requests: Install python-mysqldb on terbium - https://phabricator.wikimedia.org/T84075#922425 (10Dzahn) 5Resolved>3Open re-opening to ask if we can remove this again. when setting up the equivalent of terbium in codfw i noticed this package being installed straight on the node level (meh... [17:23:41] (03CR) 10BBlack: "We need to do something of this sort, what concerns me, though, is the impact of moving these request-mangles of the hostname above the mo" [puppet] - 10https://gerrit.wikimedia.org/r/279564 (https://phabricator.wikimedia.org/T130904) (owner: 10GWicke) [17:23:52] 6Operations, 10ops-requests: Install python-mysqldb on terbium - https://phabricator.wikimedia.org/T84075#2151470 (10Dzahn) a:5coren>3Dzahn [17:24:54] (03PS2) 10Alex Monk: Proxydashboard: Explanatory changes to the 'Create a Proxy' panel [puppet] - 10https://gerrit.wikimedia.org/r/279581 (owner: 10Andrew Bogott) [17:25:03] 6Operations, 10ops-requests: remove (was: Install) python-mysqldb on terbium - https://phabricator.wikimedia.org/T84075#922425 (10Dzahn) [17:25:10] RECOVERY - puppet last run on restbase1010 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:26:59] RECOVERY - puppet last run on restbase1013 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:27:45] 6Operations, 10ops-codfw: rack/setup new host graphite2002 - https://phabricator.wikimedia.org/T130938#2151480 (10RobH) [17:28:01] 6Operations, 10ops-codfw: rack/setup new host graphite2002 - https://phabricator.wikimedia.org/T130938#2151497 (10RobH) [17:28:08] RECOVERY - puppet last run on restbase1007 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [17:29:17] (03CR) 10Alex Monk: [C: 031] "This makes sense, though I wonder if we should consider pointing out that "hostname" is just the first part of the URL before the domain" [puppet] - 10https://gerrit.wikimedia.org/r/279581 (owner: 10Andrew Bogott) [17:29:32] legoktm: hey, remember this? https://phabricator.wikimedia.org/T84075 from back in RT [17:30:22] legoktm: i'm wondering if we still need that on maintenance hosts. (because we are setting up the terbium equivalent in codfw) [17:30:57] (03CR) 10Alex Monk: [C: 031] "Didn't test, but looks correct according to the django docs" [puppet] - 10https://gerrit.wikimedia.org/r/279582 (owner: 10Andrew Bogott) [17:31:22] 6Operations, 10Continuous-Integration-Infrastructure: Update phantomjs to 2.1.1 on trusty - https://phabricator.wikimedia.org/T130940#2151512 (10Paladox) [17:31:23] (03PS2) 10Alex Monk: Proxydashboard: Validate the proxy name with a standard dns hostname regex [puppet] - 10https://gerrit.wikimedia.org/r/279582 (owner: 10Andrew Bogott) [17:32:34] 6Operations, 10ops-codfw: rack five new spare pool systems - https://phabricator.wikimedia.org/T130941#2151524 (10RobH) [17:33:08] 6Operations, 10ops-codfw: rack five new spare pool systems - https://phabricator.wikimedia.org/T130941#2151541 (10RobH) [17:33:08] (03PS2) 10Alex Monk: Proxydashboard: Check for ownership of a proxy before deleting. [puppet] - 10https://gerrit.wikimedia.org/r/279583 (owner: 10Andrew Bogott) [17:33:33] (03CR) 10Alex Monk: [C: 031] "I would suggest trying to handle that exception but IIRC there are issues associated with that and DeleteAction/BatchAction" [puppet] - 10https://gerrit.wikimedia.org/r/279583 (owner: 10Andrew Bogott) [17:37:02] (03PS1) 10Dzahn: mw:maintenance: move python-mysqldb from nodes to role [puppet] - 10https://gerrit.wikimedia.org/r/279637 (https://bugzilla.wikimedia.org/129930) (https://phabricator.wikimedia.org/T84075) [17:37:07] 6Operations, 10ops-codfw, 6DC-Ops, 13Patch-For-Review: setup new mw maint host - wasat - https://phabricator.wikimedia.org/T129930#2151573 (10Dzahn) [17:37:52] (03PS3) 10Alexandros Kosiaris: scap::target: Add the $deploy_user group ID [puppet] - 10https://gerrit.wikimedia.org/r/279625 (owner: 10Mobrovac) [17:37:58] (03PS4) 10Alexandros Kosiaris: scap::target: Add the $deploy_user group ID [puppet] - 10https://gerrit.wikimedia.org/r/279625 (owner: 10Mobrovac) [17:38:05] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] scap::target: Add the $deploy_user group ID [puppet] - 10https://gerrit.wikimedia.org/r/279625 (owner: 10Mobrovac) [17:38:24] (03PS2) 10Dzahn: mw:maintenance: move python-mysqldb from nodes to role [puppet] - 10https://gerrit.wikimedia.org/r/279637 (https://bugzilla.wikimedia.org/129930) (https://phabricator.wikimedia.org/T84075) [17:38:51] 6Operations, 10Continuous-Integration-Infrastructure, 6Labs: Update phantomjs to 2.1.1 on trusty - https://phabricator.wikimedia.org/T130940#2151595 (10Paladox) [17:40:23] 6Operations, 6Labs, 10wikitech.wikimedia.org: decom old wikitech-static machine - https://phabricator.wikimedia.org/T129391#2151635 (10Krenair) Yep [17:43:07] mutante: I think at some point legoktm learnt php :D [17:44:17] yuvipanda: so you say now we need php5-mysql instead now? heh [17:44:26] :P [17:44:43] (03PS1) 10Dduvall: contint: increase tmpfs size to 384M (+128M) [puppet] - 10https://gerrit.wikimedia.org/r/279640 [17:52:41] (03CR) 10Dzahn: [C: 032] mw:maintenance: move python-mysqldb from nodes to role [puppet] - 10https://gerrit.wikimedia.org/r/279637 (https://bugzilla.wikimedia.org/129930) (https://phabricator.wikimedia.org/T84075) (owner: 10Dzahn) [17:58:56] wikibugs is dead again [18:00:39] RECOVERY - puppet last run on scb1001 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [18:02:55] wasn't me this time [18:06:10] (03PS1) 10Dzahn: mw:maintenance: move eventlogging from node to role [puppet] - 10https://gerrit.wikimedia.org/r/279643 (https://phabricator.wikimedia.org/T112660) [18:06:39] RECOVERY - puppet last run on scb2002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [18:08:20] RECOVERY - puppet last run on scb2001 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [18:08:32] (03PS2) 10Dzahn: mw:maintenance: move eventlogging from node to role [puppet] - 10https://gerrit.wikimedia.org/r/279643 (https://phabricator.wikimedia.org/T112660) [18:09:09] RECOVERY - puppet last run on scb1002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [18:09:33] (03PS3) 10Dzahn: mw:maintenance: move eventlogging from node to role [puppet] - 10https://gerrit.wikimedia.org/r/279643 (https://phabricator.wikimedia.org/T112660) [18:11:35] (03PS4) 10Dzahn: mw:maintenance: move eventlogging from node to role [puppet] - 10https://gerrit.wikimedia.org/r/279643 (https://phabricator.wikimedia.org/T112660) [18:12:47] 6Operations, 10ops-requests, 13Patch-For-Review: remove (was: Install) python-mysqldb on terbium - https://phabricator.wikimedia.org/T84075#2151732 (10Dzahn) [18:12:49] 6Operations, 10ops-codfw, 6DC-Ops, 13Patch-For-Review: setup new mw maint host - wasat - https://phabricator.wikimedia.org/T129930#2151730 (10Dzahn) [18:12:57] 6Operations, 10ops-codfw, 6DC-Ops, 13Patch-For-Review: setup new mw maint host - wasat - https://phabricator.wikimedia.org/T129930#2120304 (10Dzahn) a:5Joe>3Dzahn [18:16:38] (03CR) 10Dzahn: "this lets mw1152 also include this, that surprised me at first, but it's because that happens to also include the mw:maintenance role, so " [puppet] - 10https://gerrit.wikimedia.org/r/279643 (https://phabricator.wikimedia.org/T112660) (owner: 10Dzahn) [18:19:17] (03CR) 10Dzahn: [C: 032] "http://puppet-compiler.wmflabs.org/2179/ makes mw2090 equal to terbium and will ensure it gets on wasat" [puppet] - 10https://gerrit.wikimedia.org/r/279643 (https://phabricator.wikimedia.org/T112660) (owner: 10Dzahn) [18:27:53] !log restbase started mobile-sections dump of enwiki on restbase1009 for articles edited before 2016-03-23 as per T130698 [18:27:54] T130698: Cannot edit latter section in Wikipedia app (when using Content Service) - https://phabricator.wikimedia.org/T130698 [18:27:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:33:26] bd808: works! https://logstash.wikimedia.org/#/dashboard/elasticsearch/request-id?id=dd8dc172023012d2633810f4 , https://logstash.wikimedia.org/#/dashboard/elasticsearch/request-id?id=69b281d1010b902aab19fb8e [18:33:27] 6Operations, 7Puppet, 10Beta-Cluster-Infrastructure, 10Traffic, 13Patch-For-Review: Fix puppet on deployment-cache* hosts in beta cluster - https://phabricator.wikimedia.org/T129270#2151780 (10Krenair) >>! In T129270#2151441, @BBlack wrote: > @Krenair - I've modified the patch substantially, can you re-c... [18:35:10] (03PS1) 10Yuvipanda: tools: Add class that helps build kubernetes [puppet] - 10https://gerrit.wikimedia.org/r/279648 (https://phabricator.wikimedia.org/T129311) [18:36:19] (03CR) 10jenkins-bot: [V: 04-1] tools: Add class that helps build kubernetes [puppet] - 10https://gerrit.wikimedia.org/r/279648 (https://phabricator.wikimedia.org/T129311) (owner: 10Yuvipanda) [18:37:12] (03PS2) 10Yuvipanda: tools: Add class that helps build kubernetes [puppet] - 10https://gerrit.wikimedia.org/r/279648 (https://phabricator.wikimedia.org/T129311) [18:38:16] (03PS5) 10BBlack: varnish: Fix puppet in deployment-prep [puppet] - 10https://gerrit.wikimedia.org/r/277058 (https://phabricator.wikimedia.org/T129270) (owner: 10Alex Monk) [18:39:25] (03PS3) 10Yuvipanda: tools: Add class that helps build kubernetes [puppet] - 10https://gerrit.wikimedia.org/r/279648 (https://phabricator.wikimedia.org/T129311) [18:43:11] (03PS1) 10Dzahn: mediawiki: maintenance servers should all have home backup [puppet] - 10https://gerrit.wikimedia.org/r/279649 (https://phabricator.wikimedia.org/T129930) [18:45:34] (03PS2) 10Dzahn: mediawiki: maintenance servers should all have home backup [puppet] - 10https://gerrit.wikimedia.org/r/279649 (https://phabricator.wikimedia.org/T129930) [18:46:55] (03CR) 10Paladox: [C: 031] "Yes please, we keep getting memory errors because we have to keep downloading phantomjs and extract it which can use a lot of memory." [puppet] - 10https://gerrit.wikimedia.org/r/279640 (owner: 10Dduvall) [18:53:50] (03CR) 10BBlack: [C: 032] varnish: Fix puppet in deployment-prep [puppet] - 10https://gerrit.wikimedia.org/r/277058 (https://phabricator.wikimedia.org/T129270) (owner: 10Alex Monk) [18:56:39] RECOVERY - changeprop endpoints health on scb1001 is OK: All endpoints are healthy [18:57:31] 6Operations, 7Puppet, 10Beta-Cluster-Infrastructure, 10Traffic, 13Patch-For-Review: Fix puppet on deployment-cache* hosts in beta cluster - https://phabricator.wikimedia.org/T129270#2151889 (10Krenair) 5Open>3Resolved [19:00:24] (03CR) 10Dzahn: [C: 032] "http://puppet-compiler.wmflabs.org/2180/" [puppet] - 10https://gerrit.wikimedia.org/r/279649 (https://phabricator.wikimedia.org/T129930) (owner: 10Dzahn) [19:00:54] (03PS3) 10Dzahn: mediawiki: maintenance servers should all have home backup [puppet] - 10https://gerrit.wikimedia.org/r/279649 (https://phabricator.wikimedia.org/T129930) [19:01:53] (03CR) 10Dzahn: "no-op terbium, mw2090. added on mw1152" [puppet] - 10https://gerrit.wikimedia.org/r/279649 (https://phabricator.wikimedia.org/T129930) (owner: 10Dzahn) [19:02:00] PROBLEM - changeprop endpoints health on scb1001 is CRITICAL: Generic error: Generic connection error: HTTPConnectionPool(host=10.64.0.16, port=7272): Max retries exceeded with url: /?spec (Caused by ProtocolError(Connection aborted., error(111, Connection refused))) [19:08:01] !log scb1001 - Unit changeprop.service entered failed state - it's a new deployment though, acked earlier by mobrovac already [19:08:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:09:18] mobrovac: i see in backlog that you ACKed this earlier, because new deploy. [19:14:18] yuvipanda: remember this? https://gerrit.wikimedia.org/r/#/c/163141/ should we say all mw-maintenance hosts are also LDAP client hosts? [19:14:52] should i make it part of the role [19:15:05] mutante: aaaah, nope. It just needs to be *a* host, and there's no reason to tie that to mw deployment [19:15:14] they're totally unrelated functionality [19:15:18] yuvipanda: it's not tied to deployment [19:15:23] it's tied to mw-maintenance [19:15:50] there's nothing related to mw in there [19:15:53] nor to mw-maintenance [19:16:21] could make a separate role for it that makes more sense though [19:16:28] unlike our current 'ldap' module which makes no sense [19:16:38] i'd like to have it in _a_ role [19:16:44] it doesnt have to be the mw role [19:16:53] but just want it the same in both DCs [19:17:27] and avoid doing stuff on the node/hostname level [19:17:29] so yes [19:18:03] also for failover to other dc [19:18:30] yea.. eh ldap::role::client [19:18:45] should be role::ldap::client [19:20:12] mutante: indeed :) if you look at the ldap 'module' I'm sure you'll be incredibly horrified [19:22:03] uhm.. yea.. i'll come up with something. not sure what first then [19:22:03] :) [19:22:47] just making the terbium equivalent and already recuded the number of things done straight in site.pp [19:22:52] bbiaw [19:24:36] 6Operations, 10ops-eqiad: db1067 degraded RAID - https://phabricator.wikimedia.org/T130517#2138224 (10Volans) All looking good, rebuild in progress: ``` $ sudo megacli -PDRbld -ShowProg -PhysDrv [32:1] -aALL Rebuild Progress on Device at Enclosure 32, Slot 1 Completed 67% in 337 Minutes. ``` [19:26:18] RECOVERY - changeprop endpoints health on scb1001 is OK: All endpoints are healthy [19:30:01] (03PS1) 10Dzahn: mw:maintenance: add role to wasat [puppet] - 10https://gerrit.wikimedia.org/r/279659 (https://phabricator.wikimedia.org/T129930) [19:32:19] !log running extensions/CentralAuth/maintenance/checkLocal{Names,User}.php on jawiktionary and dewikiquote [19:32:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:40:20] PROBLEM - Kafka Broker Replica Max Lag on kafka1013 is CRITICAL: CRITICAL: 55.17% of data above the critical threshold [5000000.0] [19:54:09] RECOVERY - Kafka Broker Replica Max Lag on kafka1013 is OK: OK: Less than 50.00% above the threshold [1000000.0] [20:01:49] PROBLEM - Check size of conntrack table on kafka1022 is CRITICAL: CRITICAL: nf_conntrack is 90 % full [20:01:59] PROBLEM - Check size of conntrack table on kafka1018 is CRITICAL: CRITICAL: nf_conntrack is 90 % full [20:02:39] PROBLEM - Check size of conntrack table on kafka1014 is CRITICAL: CRITICAL: nf_conntrack is 90 % full [20:02:48] PROBLEM - Check size of conntrack table on kafka1012 is CRITICAL: CRITICAL: nf_conntrack is 90 % full [20:02:50] PROBLEM - Check size of conntrack table on kafka1013 is CRITICAL: CRITICAL: nf_conntrack is 90 % full [20:03:59] PROBLEM - Check size of conntrack table on kafka1020 is CRITICAL: CRITICAL: nf_conntrack is 91 % full [20:16:39] PROBLEM - Check size of conntrack table on kafka1014 is CRITICAL: CRITICAL: nf_conntrack is 90 % full [20:16:48] PROBLEM - Check size of conntrack table on kafka1012 is CRITICAL: CRITICAL: nf_conntrack is 90 % full [20:16:49] PROBLEM - Check size of conntrack table on kafka1013 is CRITICAL: CRITICAL: nf_conntrack is 90 % full [20:18:19] (03PS1) 10Ori.livneh: Integrate X-Wikimedia-Debug with Logstash [mediawiki-config] - 10https://gerrit.wikimedia.org/r/279667 [20:18:59] (03CR) 10jenkins-bot: [V: 04-1] Integrate X-Wikimedia-Debug with Logstash [mediawiki-config] - 10https://gerrit.wikimedia.org/r/279667 (owner: 10Ori.livneh) [20:21:58] PROBLEM - Check size of conntrack table on kafka1012 is CRITICAL: CRITICAL: nf_conntrack is 90 % full [20:21:59] PROBLEM - Check size of conntrack table on kafka1013 is CRITICAL: CRITICAL: nf_conntrack is 90 % full [20:22:48] PROBLEM - Check size of conntrack table on kafka1022 is CRITICAL: CRITICAL: nf_conntrack is 90 % full [20:22:49] PROBLEM - Check size of conntrack table on kafka1018 is CRITICAL: CRITICAL: nf_conntrack is 91 % full [20:23:10] PROBLEM - Check size of conntrack table on kafka1020 is CRITICAL: CRITICAL: nf_conntrack is 91 % full [20:23:30] PROBLEM - Check size of conntrack table on kafka1014 is CRITICAL: CRITICAL: nf_conntrack is 90 % full [20:29:30] (03PS1) 10Yuvipanda: Update uidenforcer to work with v1.2.0 release [debs/kubernetes] - 10https://gerrit.wikimedia.org/r/279668 [20:30:08] (03PS2) 10Ori.livneh: Integrate X-Wikimedia-Debug with Logstash [mediawiki-config] - 10https://gerrit.wikimedia.org/r/279667 [20:39:11] bd808: ^ [20:44:02] mmmmm don't really like the above alarms [20:45:06] (03CR) 10Ori.livneh: "Staged on mw1099, works well." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/279667 (owner: 10Ori.livneh) [20:46:44] !log ori@tin Synchronized php-1.27.0-wmf.18/includes/filerepo/LocalRepo.php: Idaa1237638: Request-local caching of image_redirect and I545ce6b160b: Lower pcTTL in checkRedirect() to 30 (duration: 00m 37s) [20:46:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:49:16] (03CR) 10Yuvipanda: [C: 032 V: 032] Update uidenforcer to work with v1.2.0 release [debs/kubernetes] - 10https://gerrit.wikimedia.org/r/279668 (owner: 10Yuvipanda) [20:51:21] (03PS4) 10Yuvipanda: tools: Add class that helps build kubernetes [puppet] - 10https://gerrit.wikimedia.org/r/279648 (https://phabricator.wikimedia.org/T129311) [20:54:49] RECOVERY - carbon-cache write error on graphite1001 is OK: OK: Less than 1.00% above the threshold [1.0] [20:55:23] weird thing on kafka1020 - netstat -tunap shows ~700 connections, most of them established, meanwhile /proc/net/ip_conntrack shows tons of TIME_WAITs [20:56:27] I am worried that if we don't bump net.netfilter.nf_conntrack_max we could get into packets dropped [20:56:53] net.netfilter.nf_conntrack_count doesn't seem to be increasing [20:57:32] at least not very rapidly [20:58:36] anybody that can keep an eye on the kafka hosts? [20:58:50] I was just checking IRC but have to go :( [21:01:15] elukey: what would you like me to do? [21:01:59] ori: o/ [21:02:18] it might be nothing but I was a bit worried about those conntrack errors [21:02:51] might be needed to bump net.netfilter.nf_conntrack_max if the situation gets worse [21:03:05] should we do it anyway, just so this doesn't blow up over the weekend? [21:03:26] could be a good solution [21:04:08] netstat seems clear, it is something weird with nf_conntrack.. maybe a different view of TIME_WAITs? [21:05:24] (03PS1) 10Yuvipanda: Update uidenforcer to work with v1.2.0 release [software/kubernetes] - 10https://gerrit.wikimedia.org/r/279674 [21:05:45] (03CR) 10Yuvipanda: [C: 032 V: 032] Update uidenforcer to work with v1.2.0 release [software/kubernetes] - 10https://gerrit.wikimedia.org/r/279674 (owner: 10Yuvipanda) [21:07:37] (03PS1) 10BryanDavis: Logging: convert to short array syntax [mediawiki-config] - 10https://gerrit.wikimedia.org/r/279675 [21:08:18] (03CR) 10BryanDavis: [C: 031] "Follow up commit at I28b137f5e0358b9b54c217c2e76ed7d117f0fae8 to convert the whole file to short array syntax." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/279667 (owner: 10Ori.livneh) [21:08:54] elukey: conntrack -L is showing connections that netstat is not, but I don't know why, or even whether that is abnormal or not [21:10:45] elukey: http://serverfault.com/a/313317 "Conntrack module remembers recent connections for X seconds before they finally expire. [...] netstat, on the other hand, shows real-time information and is not interested about ancient history" [21:12:13] (03CR) 10Ori.livneh: [C: 032] Integrate X-Wikimedia-Debug with Logstash [mediawiki-config] - 10https://gerrit.wikimedia.org/r/279667 (owner: 10Ori.livneh) [21:13:24] (03Merged) 10jenkins-bot: Integrate X-Wikimedia-Debug with Logstash [mediawiki-config] - 10https://gerrit.wikimedia.org/r/279667 (owner: 10Ori.livneh) [21:13:35] (03CR) 10Ori.livneh: [C: 032] Logging: convert to short array syntax [mediawiki-config] - 10https://gerrit.wikimedia.org/r/279675 (owner: 10BryanDavis) [21:13:49] ori: yeah I found more or less the same.. [21:14:03] (03Merged) 10jenkins-bot: Logging: convert to short array syntax [mediawiki-config] - 10https://gerrit.wikimedia.org/r/279675 (owner: 10BryanDavis) [21:14:18] elukey: so there was a surge of connections, which had abated by the time you ran netstat, but which were still visible in conntrack [21:14:29] PROBLEM - Kafka Broker Replica Max Lag on kafka1022 is CRITICAL: CRITICAL: 51.72% of data above the critical threshold [5000000.0] [21:14:40] ---^ totally unrelated [21:14:45] and known bug [21:15:42] what's the bug? [21:16:29] in kafka <= 0.8.. sometimes a replica incorrectly decides that it hasn't collected anything from its master and starts from scratch [21:16:43] causing a lag because it has to fetch a lot of data [21:17:04] it should go away with the new kafka version :) [21:17:29] and that is not associated with a spike in connections, when it starts from scratch? [21:17:55] (I don't mean to keep you, if you have to go, but I can't promise to look after this, since I have to leave in half an hour too) [21:18:12] theoretically no, at least from what I know (https://grafana.wikimedia.org/dashboard/db/kafka?panelId=29&fullscreen) [21:18:45] sorry https://grafana.wikimedia.org/dashboard/db/kafka?panelId=16&fullscreen better, that shows the temp spike [21:18:48] for 1022 [21:18:49] anyhoowww [21:19:24] gotta go know, just wanted to have some discussion.. thanks for your time :) I'll try to double check tomorrow the conntracks [21:19:33] ori: --^ [21:19:39] np, sorry I couldn't help more [21:19:58] !log ori@tin Synchronized wmf-config/logging.php: Ibc96f9d3bd: Integrate X-Wikimedia-Debug with Logstash; I28b137f5e035: Logging: convert to short array syntax (duration: 00m 36s) [21:20:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:22:12] (03PS1) 10Yuvipanda: Fix tests to run with v1.2.0 [software/kubernetes] - 10https://gerrit.wikimedia.org/r/279677 [21:23:25] last message for everybody: if the issue gets worse with nf_conntrack and kafka here's a handy code review did by moritzm for the jobrunners: https://gerrit.wikimedia.org/r/#/c/278290/2/modules/role/manifests/mediawiki/jobrunner.pp [21:24:10] byyyeee [21:28:28] RECOVERY - Kafka Broker Replica Max Lag on kafka1022 is OK: OK: Less than 50.00% above the threshold [1000000.0] [21:29:08] the changeprop service is spewing a lot of "worker 8288 died (1), restarting." messages at logstash [21:29:30] gwicke: ^ [21:31:01] bd808: okay, any opsens with access around? [21:32:05] Error: Cannot find module '/srv/deployment/changeprop/deploy-cache/revs/86a8484d77add545dbb23454b5605722e24f3104/node_modules/src/sys/kafka.js' [21:32:11] looks like some deployment system problem [21:35:41] this service is not used for anything, so shutting it down should be fine [21:38:05] (03PS2) 10Yuvipanda: Fix tests to run with v1.2.0 [software/kubernetes] - 10https://gerrit.wikimedia.org/r/279677 [21:48:49] RECOVERY - Check size of conntrack table on kafka1022 is OK: OK: nf_conntrack is 79 % full [21:49:19] RECOVERY - Check size of conntrack table on kafka1020 is OK: OK: nf_conntrack is 79 % full [21:49:40] RECOVERY - Check size of conntrack table on kafka1014 is OK: OK: nf_conntrack is 79 % full [21:49:48] RECOVERY - Check size of conntrack table on kafka1012 is OK: OK: nf_conntrack is 79 % full [21:49:50] RECOVERY - Check size of conntrack table on kafka1013 is OK: OK: nf_conntrack is 79 % full [21:52:51] !log Updated Grafana to latest nightly [21:52:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:54:18] RECOVERY - Check size of conntrack table on kafka1018 is OK: OK: nf_conntrack is 79 % full [22:02:40] (03PS1) 10Yuvipanda: Fix case of "Pods" to make tests pass [software/kubernetes] - 10https://gerrit.wikimedia.org/r/279679 [22:03:16] (03CR) 10Yuvipanda: [C: 032 V: 032] Fix tests to run with v1.2.0 [software/kubernetes] - 10https://gerrit.wikimedia.org/r/279677 (owner: 10Yuvipanda) [22:03:28] (03CR) 10Yuvipanda: [C: 032 V: 032] Fix case of "Pods" to make tests pass [software/kubernetes] - 10https://gerrit.wikimedia.org/r/279679 (owner: 10Yuvipanda) [22:07:09] (03PS2) 10Andrew Bogott: Horizon: Update session config [puppet] - 10https://gerrit.wikimedia.org/r/279186 (https://phabricator.wikimedia.org/T130621) [22:08:30] (03CR) 10Andrew Bogott: [C: 032] Horizon: Update session config [puppet] - 10https://gerrit.wikimedia.org/r/279186 (https://phabricator.wikimedia.org/T130621) (owner: 10Andrew Bogott) [22:08:53] (03PS10) 10Andrew Bogott: openstack: Add proxy panel files [puppet] - 10https://gerrit.wikimedia.org/r/278871 (https://phabricator.wikimedia.org/T129245) (owner: 10Alex Monk) [22:09:07] (03PS3) 10Andrew Bogott: Proxydashboard: Move proxy panel into the 'dns' group [puppet] - 10https://gerrit.wikimedia.org/r/279580 [22:09:14] (03PS3) 10Andrew Bogott: Proxydashboard: Explanatory changes to the 'Create a Proxy' panel [puppet] - 10https://gerrit.wikimedia.org/r/279581 [22:09:24] (03PS3) 10Andrew Bogott: Proxydashboard: Validate the proxy name with a standard dns hostname regex [puppet] - 10https://gerrit.wikimedia.org/r/279582 [22:09:31] (03PS3) 10Andrew Bogott: Proxydashboard: Check for ownership of a proxy before deleting. [puppet] - 10https://gerrit.wikimedia.org/r/279583 [22:11:25] (03CR) 10Andrew Bogott: [C: 032] Proxydashboard: Explanatory changes to the 'Create a Proxy' panel [puppet] - 10https://gerrit.wikimedia.org/r/279581 (owner: 10Andrew Bogott) [22:11:27] (03CR) 10Andrew Bogott: [C: 032] Proxydashboard: Validate the proxy name with a standard dns hostname regex [puppet] - 10https://gerrit.wikimedia.org/r/279582 (owner: 10Andrew Bogott) [22:11:41] (03CR) 10Andrew Bogott: [C: 032] Proxydashboard: Check for ownership of a proxy before deleting. [puppet] - 10https://gerrit.wikimedia.org/r/279583 (owner: 10Andrew Bogott) [22:24:14] (03PS5) 10Yuvipanda: tools: Add class that helps build kubernetes [puppet] - 10https://gerrit.wikimedia.org/r/279648 (https://phabricator.wikimedia.org/T129311) [22:24:29] (03PS1) 10Dzahn: remove mw-maintenance, LDAP client from mw2090 [puppet] - 10https://gerrit.wikimedia.org/r/279680 [22:31:18] RECOVERY - RAID on db1067 is OK: OK: optimal, 1 logical, 6 physical [22:39:37] (03PS1) 10Dzahn: ldap: rename role classes [puppet] - 10https://gerrit.wikimedia.org/r/279682 [22:45:33] (03PS2) 10Dzahn: ldap: rename role classes [puppet] - 10https://gerrit.wikimedia.org/r/279682 [22:51:58] PROBLEM - Disk space on ms-be2008 is CRITICAL: DISK CRITICAL - /srv/swift-storage/sdl1 is not accessible: Input/output error [22:52:09] PROBLEM - RAID on ms-be2008 is CRITICAL: CRITICAL: 1 failed LD(s) (Offline) [22:53:31] PROBLEM - puppet last run on ms-be2008 is CRITICAL: CRITICAL: Puppet has 1 failures [22:58:18] (03PS1) 10GWicke: Drop wmf prefix for resource_change topic [puppet] - 10https://gerrit.wikimedia.org/r/279685 [23:02:38] RECOVERY - Disk space on ms-be2008 is OK: DISK OK [23:10:02] (03CR) 10Dduvall: "Cherry-picked on integration-puppetmaster for testing" [puppet] - 10https://gerrit.wikimedia.org/r/279640 (owner: 10Dduvall) [23:21:35] bd808: ack, will stop the service everywhere [23:21:45] (sorry forgot to do it earlier) [23:44:09] (03PS2) 10Dzahn: contint: increase tmpfs size to 384M (+128M) [puppet] - 10https://gerrit.wikimedia.org/r/279640 (owner: 10Dduvall) [23:45:01] (03CR) 10Dzahn: [C: 032] contint: increase tmpfs size to 384M (+128M) [puppet] - 10https://gerrit.wikimedia.org/r/279640 (owner: 10Dduvall) [23:45:14] mutante: thanks! [23:45:38] 6Operations, 13Patch-For-Review, 15User-mobrovac: Replace role::kafka::*::config classes with puppet functions. - https://phabricator.wikimedia.org/T130371#2152523 (10mobrovac) p:5Triage>3Normal [23:47:00] 7Blocked-on-Operations, 6Operations, 10EventBus, 6Services, and 3 others: New Service Request - Change Propagation - https://phabricator.wikimedia.org/T128463#2152527 (10mobrovac) [23:47:43] marxarelli: yw [23:48:07] (03CR) 10Dduvall: "Thanks! I've removed the cherry pick and fetched the latest master on integration-puppetmaster." [puppet] - 10https://gerrit.wikimedia.org/r/279640 (owner: 10Dduvall)