[00:01:44] 6operations, 10Adminbot, 5Patch-For-Review: Upload new release of adminbot for Trusty - https://phabricator.wikimedia.org/T109947#1652026 (10Dzahn) root@toolsbeta-exec-202:~# apt-get install adminbot Preparing to unpack .../adminbot_1.7.13_all.deb ... Unpacking adminbot (1.7.13) over (1.7.8) ... .. ii adm... [00:01:57] 6operations, 10Adminbot, 5Patch-For-Review: Upload new release of adminbot for Trusty - https://phabricator.wikimedia.org/T109947#1652027 (10Dzahn) 5Open>3Resolved [00:02:11] 6operations, 10Adminbot: Upload new release of adminbot for Trusty - https://phabricator.wikimedia.org/T109947#1563644 (10Dzahn) [00:02:53] PROBLEM - puppet last run on mw2191 is CRITICAL: CRITICAL: puppet fail [00:03:36] 6operations, 10RESTBase, 6Services: RESTBase logging broken in both production & staging - https://phabricator.wikimedia.org/T112985#1652033 (10ori) RESTBase is configured to log to logstash1001. (I don't think we have a service IP / LVS for logstash -- fixing that should be a priority.) On logstash1001, I... [00:04:23] 6operations, 10Wikimedia-Mailing-lists: @txt.att.net bounce notifications being sent to list admins - https://phabricator.wikimedia.org/T112912#1652035 (10Ocaasi) Just so I don't feel left out, Wikipedia Library list is getting one every few hours. [00:14:43] !log restarted logstash on logstash1001 [00:14:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:21:49] (03CR) 10Yuvipanda: "Do we still care about this? Also does exported resources work in beta? It doesn't rebase cleanly either." [puppet] - 10https://gerrit.wikimedia.org/r/179121 (owner: 10Giuseppe Lavagetto) [00:25:04] PROBLEM - Restbase endpoints health on xenon is CRITICAL: Generic error: Generic connection error: HTTPConnectionPool(host=127.0.0.1, port=7231): Max retries exceeded with url: /en.wikipedia.org/v1/?spec (Caused by ProtocolError(Connection aborted., error(111, Connection refused))) [00:25:27] gwicke: ^ [00:26:31] yuvipanda: yeah, I'm testing things there [00:26:35] let me ack [00:26:43] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0] [00:28:43] RECOVERY - Restbase endpoints health on xenon is OK: All endpoints are healthy [00:28:59] (03PS1) 10Ori.livneh: logstash: add monitoring for logstash process [puppet] - 10https://gerrit.wikimedia.org/r/239307 (https://phabricator.wikimedia.org/T112985) [00:29:04] paravoid, gwicke ^ [00:29:49] (there does not appear to be any monitoring, in case you were wondering) [00:32:33] RECOVERY - puppet last run on mw2191 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:32:42] * ori merges [00:32:51] (03CR) 10Ori.livneh: [C: 032] logstash: add monitoring for logstash process [puppet] - 10https://gerrit.wikimedia.org/r/239307 (https://phabricator.wikimedia.org/T112985) (owner: 10Ori.livneh) [00:35:03] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [00:35:34] RECOVERY - HTTP 5xx req/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [00:46:07] (03PS1) 10Alex Monk: Make commons and wikimania SUL logos transparent [mediawiki-config] - 10https://gerrit.wikimedia.org/r/239308 (https://phabricator.wikimedia.org/T72829) [00:49:35] PROBLEM - Restbase endpoints health on xenon is CRITICAL: Generic error: Generic connection error: HTTPConnectionPool(host=127.0.0.1, port=7231): Max retries exceeded with url: /en.wikipedia.org/v1/?spec (Caused by ProtocolError(Connection aborted., error(111, Connection refused))) [00:50:13] (03Abandoned) 10Yuvipanda: redirects for develop{,er{,s}}.wiki{p,m}edia.org [apache-config] - 10https://gerrit.wikimedia.org/r/24407 (owner: 10Jeremyb) [00:51:24] RECOVERY - Restbase endpoints health on xenon is OK: All endpoints are healthy [00:58:42] (03CR) 10Yuvipanda: "@Qchris is this good for merging / review now?" [puppet] - 10https://gerrit.wikimedia.org/r/237753 (https://phabricator.wikimedia.org/T112025) (owner: 10QChris) [01:00:31] (03CR) 10Yuvipanda: "@Qchris where does this need to be checked? On which host?" [puppet] - 10https://gerrit.wikimedia.org/r/238976 (owner: 10QChris) [01:01:49] (03CR) 10Yuvipanda: [C: 04-2] "NOPE. What's wrong with ["a", "b"]? Say no to Perlisms!" [puppet] - 10https://gerrit.wikimedia.org/r/238778 (https://phabricator.wikimedia.org/T112651) (owner: 10Zfilipin) [01:02:12] 6operations, 10Wikimedia-Mailing-lists: @txt.att.net bounce notifications being sent to list admins - https://phabricator.wikimedia.org/T112912#1652098 (10Jalexander) >>! In T112912#1651887, @Platonides wrote: > I don't really know what's the point in doing this (flood someone's phone?) Is there any reasonable... [01:03:08] jynus, [01:03:10] -rwxr-xr-x 1 root root 13374 Sep 3 16:24 db-codfw.php [01:03:12] That's part of mediawiki-config, I don't think it's supposed to be root owned [01:03:14] pretty sure db-eqiad used to be like this, but it sounds like you fixed it [01:03:19] (03CR) 10Alex Monk: "Ew." [puppet] - 10https://gerrit.wikimedia.org/r/238778 (https://phabricator.wikimedia.org/T112651) (owner: 10Zfilipin) [01:04:29] (03CR) 10Yuvipanda: [C: 04-2] "Yes, puppet code guidelines say yes to trailing commas, let's keep them consistent otherwise it'll be weird." [puppet] - 10https://gerrit.wikimedia.org/r/238779 (https://phabricator.wikimedia.org/T112651) (owner: 10Zfilipin) [01:05:07] PROBLEM - logstash process on logstash1001 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 998 (logstash), command name java, args logstash [01:07:41] (03CR) 10Alex Monk: "It's merged now. Can this be abandoned?" [puppet] - 10https://gerrit.wikimedia.org/r/111387 (owner: 10Jeremyb) [01:07:43] yuvipanda, ^ [01:08:46] 6operations, 10MediaWiki-ResourceLoader, 6Performance-Team, 10Traffic: [Research] Investigate 30% load.php reqs increase since 2015-07-30 - https://phabricator.wikimedia.org/T113007#1652104 (10Krinkle) 3NEW [01:09:06] 6operations, 10MediaWiki-ResourceLoader, 6Performance-Team, 10Traffic: [Research] Investigate 30% load.php reqs increase since 2015-07-30 - https://phabricator.wikimedia.org/T113007#1652116 (10Krinkle) [01:09:15] (03Abandoned) 10Yuvipanda: rm root cert from chain [puppet] - 10https://gerrit.wikimedia.org/r/111387 (owner: 10Jeremyb) [01:09:40] (03CR) 10Alex Monk: "Only 2 supporters?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237049 (https://phabricator.wikimedia.org/T111753) (owner: 10MarcoAurelio) [01:13:09] (03PS2) 10Yuvipanda: Tessera: base config.py.erb on Tessera's config.py [puppet] - 10https://gerrit.wikimedia.org/r/222365 (owner: 10Ori.livneh) [01:14:09] ty yuvipanda [01:14:20] (03CR) 10Yuvipanda: [C: 032 V: 032] "Asssssummmming this is ok and the others are the defaults anyway." [puppet] - 10https://gerrit.wikimedia.org/r/222365 (owner: 10Ori.livneh) [01:14:32] krrrit-wm: I'm going through things trying to -2 and abandon obvious things [01:16:21] (03Abandoned) 10Yuvipanda: Revert "base: ensure => absent on 'command-not-found'" [puppet] - 10https://gerrit.wikimedia.org/r/233156 (owner: 10Alex Monk) [01:18:08] (03Abandoned) 10Yuvipanda: labstore: Be less noisy in logging [puppet] - 10https://gerrit.wikimedia.org/r/223075 (owner: 10Yuvipanda) [01:18:23] (03Abandoned) 10Alex Monk: Redirect most noc.wikimedia.org/conf URLs to Diffusion [puppet] - 10https://gerrit.wikimedia.org/r/224214 (owner: 10Alex Monk) [01:18:38] (03CR) 10Ottomata: "I think so. Please contact kleduc@wikimedia.org and ask." [puppet] - 10https://gerrit.wikimedia.org/r/197081 (https://phabricator.wikimedia.org/T83531) (owner: 10ArielGlenn) [01:18:40] (03Abandoned) 10Alex Monk: Get rid of most of noc.wikimedia.org/conf [mediawiki-config] - 10https://gerrit.wikimedia.org/r/222942 (owner: 10Alex Monk) [01:18:58] (03CR) 10Yuvipanda: "Should we abandon this or should we merge this? @apergos?" [puppet] - 10https://gerrit.wikimedia.org/r/219134 (owner: 10ArielGlenn) [01:19:21] (03CR) 10Alex Monk: "I added them to the reviewer list." [puppet] - 10https://gerrit.wikimedia.org/r/197081 (https://phabricator.wikimedia.org/T83531) (owner: 10ArielGlenn) [01:19:28] Krenair: so my hope is that we spend some time cleaning up old stuff, and merging / getting reviews from people, and most new patches go through puppetswat [01:23:07] (03CR) 10Yuvipanda: "Who runs wikivoyage-ev? I guess it doesn't matter for us since this is a simple referrer check - would still be good to know." [puppet] - 10https://gerrit.wikimedia.org/r/239279 (owner: 10Yurik) [01:23:24] (03CR) 10Yuvipanda: "Is varnish, so I won't merge :)" [puppet] - 10https://gerrit.wikimedia.org/r/239279 (owner: 10Yurik) [01:24:16] (03PS2) 10Yuvipanda: Update ssh key for brion [puppet] - 10https://gerrit.wikimedia.org/r/239155 (owner: 10Brion VIBBER) [01:24:29] 6operations, 10MediaWiki-ResourceLoader, 6Performance-Team, 10Traffic: [Research] Investigate 30% load.php reqs increase since 2015-07-30 - https://phabricator.wikimedia.org/T113007#1652166 (10Krinkle) Looking more closely, it happened around these three events: {F2614171 height=233} https://wikitech.wik... [01:24:55] (03CR) 10Yuvipanda: [C: 032 V: 032] Update ssh key for brion [puppet] - 10https://gerrit.wikimedia.org/r/239155 (owner: 10Brion VIBBER) [01:26:15] (03CR) 10Yurik: "http://wikivoyage-ev.org/wiki/%C3%9Cber_uns ... in german." [puppet] - 10https://gerrit.wikimedia.org/r/239279 (owner: 10Yurik) [01:26:17] (03PS2) 10Yuvipanda: Make Phabricator's license footer cover uploads / "content" [puppet] - 10https://gerrit.wikimedia.org/r/239073 (owner: 10Aklapper) [01:26:23] (03CR) 10Yuvipanda: [C: 032 V: 032] Make Phabricator's license footer cover uploads / "content" [puppet] - 10https://gerrit.wikimedia.org/r/239073 (owner: 10Aklapper) [01:26:36] (03CR) 10Alex Monk: [C: 04-1] "per ticket" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/186916 (https://phabricator.wikimedia.org/T19471) (owner: 10Nemo bis) [01:27:11] (03CR) 10Yuvipanda: "I agree with @Alex. I don't see a problem with this at all, but can you file an access request, @andre__?" [puppet] - 10https://gerrit.wikimedia.org/r/219151 (owner: 10Aklapper) [01:27:29] yuvipanda while you update brion's key, can you please update mine - its been sitting for 7 days [01:27:39] (03PS2) 10Yuvipanda: [WIP] Enable automatic redirect to mobile Wikidata [puppet] - 10https://gerrit.wikimedia.org/r/238396 (https://phabricator.wikimedia.org/T111015) (owner: 10Bene) [01:27:48] https://gerrit.wikimedia.org/r/#/c/237473/ [01:28:06] (03CR) 10Alex Monk: [C: 04-1] "open dependency in another repo" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/207909 (owner: 10Paladox) [01:28:08] (03CR) 10Yuvipanda: [C: 04-2] "Assuming this is what was meant by DNM :) Marked as WIP and -2ing until informed otherwise." [puppet] - 10https://gerrit.wikimedia.org/r/238396 (https://phabricator.wikimedia.org/T111015) (owner: 10Bene) [01:28:26] (03PS2) 10Yuvipanda: Changed yurik's pub key [puppet] - 10https://gerrit.wikimedia.org/r/237473 (owner: 10Yurik) [01:28:52] (03CR) 10Yuvipanda: [C: 032 V: 032] Changed yurik's pub key [puppet] - 10https://gerrit.wikimedia.org/r/237473 (owner: 10Yurik) [01:29:05] * yurik has been kidnapped and will obey some evil overloard's bidding [01:29:19] time to test the key now :) [01:29:20] PROBLEM - Restbase endpoints health on xenon is CRITICAL: Generic error: Generic connection error: HTTPConnectionPool(host=127.0.0.1, port=7231): Max retries exceeded with url: /en.wikipedia.org/v1/?spec (Caused by ProtocolError(Connection aborted., error(111, Connection refused))) [01:29:49] PROBLEM - Restbase root url on xenon is CRITICAL: Connection refused [01:30:11] (03PS3) 10Alex Monk: Configure $wgExtraSignatureNamespaces for it.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237331 (owner: 10Nemo bis) [01:30:17] 6operations, 10RESTBase, 6Services, 5Patch-For-Review: RESTBase logging broken in both production & staging - https://phabricator.wikimedia.org/T112985#1652175 (10ori) 5Open>3Resolved a:3ori [01:31:41] Krenair: want me to merge https://gerrit.wikimedia.org/r/#/c/232675/ so you can test? [01:32:25] thanks, worked [01:32:27] yuvipanda, yes please [01:32:33] I had been hoping for ori to merge it, but... [01:32:56] (03PS6) 10Yuvipanda: Make foreachwikiindblist accept dblist expressions [puppet] - 10https://gerrit.wikimedia.org/r/232675 (https://phabricator.wikimedia.org/T101213) (owner: 10Alex Monk) [01:32:56] So apparently, practically all of *.m* requests have client side caching disabled. [01:33:06] (03CR) 10Yuvipanda: [C: 032 V: 032] Make foreachwikiindblist accept dblist expressions [puppet] - 10https://gerrit.wikimedia.org/r/232675 (https://phabricator.wikimedia.org/T101213) (owner: 10Alex Monk) [01:33:26] And it used to be fine for mobile, because mobile wasn't using mobile domains to load static assets (e.g. load.php) [01:33:39] bits.wm.o was fine, desktop domains are fine, too. [01:33:51] and then we switched load.php to mobile domain and everthing turned to shit [01:33:59] Krenair: going to puppet agent on tin now [01:34:14] thanks [01:35:04] (03CR) 10Alex Monk: [C: 032] Configure $wgExtraSignatureNamespaces for it.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237331 (owner: 10Nemo bis) [01:35:25] (03Merged) 10jenkins-bot: Configure $wgExtraSignatureNamespaces for it.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237331 (owner: 10Nemo bis) [01:35:51] Krenair: done [01:35:52] test? [01:36:04] ok, one sec [01:36:08] !log krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/237331/ (duration: 00m 12s) [01:36:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [01:36:24] dealing with the mediawiki-config queue :) [01:38:48] RECOVERY - Restbase endpoints health on xenon is OK: All endpoints are healthy [01:38:58] RECOVERY - Restbase root url on xenon is OK: HTTP OK: HTTP/1.1 200 - 15229 bytes in 0.012 second response time [01:39:38] yuvipanda, looks good [01:41:13] thanks [01:44:31] yuvipanda, https://gerrit.wikimedia.org/r/#/c/236499/ - that file is unused right? [01:45:08] PROBLEM - LVS HTTPS IPv6 on mobile-lb.eqiad.wikimedia.org_ipv6 is CRITICAL: Connection timed out [01:46:49] RECOVERY - LVS HTTPS IPv6 on mobile-lb.eqiad.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 200 OK - 10514 bytes in 1.105 second response time [01:46:56] yuvipanda, what about https://gerrit.wikimedia.org/r/239278 ? [01:48:15] 6operations, 10RESTBase, 6Services, 5Patch-For-Review: RESTBase logging broken in both production & staging - https://phabricator.wikimedia.org/T112985#1652188 (10GWicke) 5Resolved>3Open Thanks, @ori. I'll keep this open for now to track some follow-up work: - move away from logstash1001, to avoid be... [01:53:01] (03CR) 10Alex Monk: [C: 04-1] "-1ing to get this out of the review queue, please feel free to remove when this is ready" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235467 (https://phabricator.wikimedia.org/T109707) (owner: 10Addshore) [01:54:19] (03CR) 10Alex Monk: [C: 04-1] "-1ing to get this out of the review queue, please feel free to remove if the other change is going through" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237029 (https://phabricator.wikimedia.org/T105202) (owner: 10Florianschmidtwelzow) [01:54:30] PROBLEM - IPsec on cp1059 is CRITICAL: Strongswan CRITICAL - ok: 23 not-conn: cp4011_v6 [01:56:08] (03CR) 10Alex Monk: [C: 032] Enabling extension WikiLove for outreachwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235907 (https://phabricator.wikimedia.org/T106264) (owner: 10MarcoAurelio) [01:56:13] Krenair, sorry I didn't merge it [01:56:16] yuvipanda: thanks [01:56:19] RECOVERY - IPsec on cp1059 is OK: Strongswan OK - 24 ESP OK [01:56:31] (03Merged) 10jenkins-bot: Enabling extension WikiLove for outreachwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/235907 (https://phabricator.wikimedia.org/T106264) (owner: 10MarcoAurelio) [01:57:37] !log krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://phabricator.wikimedia.org/T106264 (duration: 00m 12s) [01:57:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [01:58:09] !log ori@tin Synchronized php-1.26wmf23/includes/resourceloader/ResourceLoaderModule.php: I952068d2d: ResourceLoaderModule: cache file content hash (duration: 00m 11s) [01:58:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [01:58:34] !log ori@tin Synchronized php-1.26wmf22/includes/resourceloader/ResourceLoaderModule.php: I952068d2d: ResourceLoaderModule: cache file content hash (duration: 00m 12s) [01:58:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [01:58:47] Krenair: I'll look at it in 5mins [01:59:28] k [02:05:35] 6operations, 10Wikimedia-Mailing-lists: @txt.att.net bounce notifications being sent to list admins - https://phabricator.wikimedia.org/T112912#1652202 (10Risker) This has happened before, to varying extents. It strikes me that someone has found a way to spoof the WMF mailing list addresses, and the ATT group... [02:05:56] (03CR) 10Alex Monk: [C: 032] Update of permissions at Meta-Wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/234544 (https://phabricator.wikimedia.org/T110674) (owner: 10MarcoAurelio) [02:06:21] (03Merged) 10jenkins-bot: Update of permissions at Meta-Wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/234544 (https://phabricator.wikimedia.org/T110674) (owner: 10MarcoAurelio) [02:07:24] !log krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/234544/ (duration: 00m 12s) [02:07:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:12:39] (03CR) 10Alex Monk: [C: 032] Modify user rights configuration at frwiki for changetags [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237149 (https://phabricator.wikimedia.org/T98629) (owner: 10MarcoAurelio) [02:13:02] (03Merged) 10jenkins-bot: Modify user rights configuration at frwiki for changetags [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237149 (https://phabricator.wikimedia.org/T98629) (owner: 10MarcoAurelio) [02:13:30] !log krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/237149/ (duration: 00m 12s) [02:13:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:16:46] (03CR) 10Alex Monk: [C: 032] Enwiki: remove changetags from user and add it to sysop, bot and abusefilter groups [mediawiki-config] - 10https://gerrit.wikimedia.org/r/218353 (https://phabricator.wikimedia.org/T97013) (owner: 10Cenarium) [02:16:49] (03CR) 10jenkins-bot: [V: 04-1] Enwiki: remove changetags from user and add it to sysop, bot and abusefilter groups [mediawiki-config] - 10https://gerrit.wikimedia.org/r/218353 (https://phabricator.wikimedia.org/T97013) (owner: 10Cenarium) [02:20:25] (03PS3) 10Alex Monk: Enwiki: remove changetags from user and add it to sysop, bot and abusefilter groups [mediawiki-config] - 10https://gerrit.wikimedia.org/r/218353 (https://phabricator.wikimedia.org/T97013) (owner: 10Cenarium) [02:20:41] (03CR) 10Alex Monk: [C: 032] Enwiki: remove changetags from user and add it to sysop, bot and abusefilter groups [mediawiki-config] - 10https://gerrit.wikimedia.org/r/218353 (https://phabricator.wikimedia.org/T97013) (owner: 10Cenarium) [02:20:47] (03Merged) 10jenkins-bot: Enwiki: remove changetags from user and add it to sysop, bot and abusefilter groups [mediawiki-config] - 10https://gerrit.wikimedia.org/r/218353 (https://phabricator.wikimedia.org/T97013) (owner: 10Cenarium) [02:21:28] !log krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/218353/ (duration: 00m 11s) [02:21:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:21:49] !log krenair@tin Synchronized wmf-config/abusefilter.php: https://gerrit.wikimedia.org/r/#/c/218353/ (duration: 00m 12s) [02:21:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:27:04] ori, https://phabricator.wikimedia.org/T113012 was that the thing that broke a while ago? [02:27:19] and then got fixed but left some pages parsed with errors? [02:28:38] !log l10nupdate@tin Synchronized php-1.26wmf23/cache/l10n: l10nupdate for 1.26wmf23 (duration: 06m 08s) [02:28:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:29:51] 6operations, 10MediaWiki-ResourceLoader, 6Performance-Team, 10Traffic: [Research] Investigate 30% load.php reqs increase since 2015-07-30 - https://phabricator.wikimedia.org/T113007#1652239 (10Krinkle) [02:31:49] !log l10nupdate@tin LocalisationUpdate completed (1.26wmf23) at 2015-09-18 02:31:49+00:00 [02:31:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:36:49] Krenair: yeah, most likely [02:37:03] did it have a ticket? [02:38:08] probably not [02:57:08] PROBLEM - mailman I/O stats on fermium is CRITICAL: CRITICAL - I/O stats: Transfers/Sec=12.90 Read Requests/Sec=37.79 Write Requests/Sec=28.64 KBytes Read/Sec=151.16 KBytes_Written/Sec=13873.77 [02:58:58] RECOVERY - mailman I/O stats on fermium is OK: OK - I/O stats: Transfers/Sec=1.30 Read Requests/Sec=115.12 Write Requests/Sec=0.30 KBytes Read/Sec=460.46 KBytes_Written/Sec=1.20 [03:04:28] PROBLEM - LVS HTTPS IPv6 on mobile-lb.eqiad.wikimedia.org_ipv6 is CRITICAL: Connection timed out [03:04:32] PROBLEM - Disk space on labstore1002 is CRITICAL: DISK CRITICAL - /run/lock/storage-replicate-labstore-others/snapshot is not accessible: Permission denied [03:06:00] RECOVERY - LVS HTTPS IPv6 on mobile-lb.eqiad.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 200 OK - 10514 bytes in 0.096 second response time [03:24:57] 6operations, 10Wikimedia-Mailing-lists: @txt.att.net bounce notifications being sent to list admins - https://phabricator.wikimedia.org/T112912#1652280 (10mcruzWMF) Same for wikimedia-ped Thanks to all who are working to solve this! :-) [03:25:29] RECOVERY - Disk space on labstore1002 is OK: DISK OK [03:32:38] PROBLEM - mailman I/O stats on fermium is CRITICAL: CRITICAL - I/O stats: Transfers/Sec=261.17 Read Requests/Sec=742.57 Write Requests/Sec=2.43 KBytes Read/Sec=2970.27 KBytes_Written/Sec=10.92 [03:36:08] RECOVERY - mailman I/O stats on fermium is OK: OK - I/O stats: Transfers/Sec=34.87 Read Requests/Sec=89.02 Write Requests/Sec=0.40 KBytes Read/Sec=356.09 KBytes_Written/Sec=2.01 [03:49:13] Krenair: still around for me to merge patches? [03:49:32] yes [03:49:55] Krenair: can you link them again? [03:50:00] irccloud is being shitty again [03:50:31] up for doing the labs dns one? :) [03:51:49] Krenair: hahahaa, no :) [03:51:55] Krenair: let me try to do that in the morning tomorrow [03:52:05] ok [03:52:19] https://gerrit.wikimedia.org/r/#/c/236499/ - I think that file is unused [03:52:39] and https://gerrit.wikimedia.org/r/#/c/239278/ [03:52:52] (03CR) 10Yuvipanda: "Is this ready to test / merge, hashar? I still see a check failing." [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/233360 (owner: 10Hashar) [03:55:26] (03PS2) 10Yuvipanda: rm templates/misc/udpmxircecho.py.erb [puppet] - 10https://gerrit.wikimedia.org/r/236499 (owner: 10Alex Monk) [03:56:00] (03CR) 10Yuvipanda: [C: 032 V: 032] "It got moved to mw-rc-irc and is being referenced from there instead." [puppet] - 10https://gerrit.wikimedia.org/r/236499 (owner: 10Alex Monk) [03:56:40] Krenair: done [03:56:46] I'll wait for a while to see if there are any puppet fialures [03:57:10] Krenair: not gonna touch the tcpircbot on ethough :) [03:57:45] Krenair: our backlog isn't too bad, at least as linked to from the gerrit patch cleanup day :) [03:58:18] (03Abandoned) 10Yuvipanda: Revert "Revert "ganeti: move role from manifests/ into the role module"" [puppet] - 10https://gerrit.wikimedia.org/r/230773 (owner: 10Alexandros Kosiaris) [04:00:30] yuvipanda, compared to..? :) [04:01:11] (03CR) 10Yuvipanda: "I think the shortest path to do here is to just move the misc/limn.pp into a role and just rename it. However, Limn is running off a self " [puppet] - 10https://gerrit.wikimedia.org/r/231144 (owner: 10Faidon Liambotis) [04:01:27] Krenair: to what I thought it was going to be [04:02:50] ah [04:03:12] (03Abandoned) 10Yuvipanda: Revive the old ceph module [puppet] - 10https://gerrit.wikimedia.org/r/212914 (owner: 10Andrew Bogott) [04:03:20] (03Abandoned) 10Yuvipanda: Revert "Remove role::ceph::*, unused now" [puppet] - 10https://gerrit.wikimedia.org/r/212938 (owner: 10Andrew Bogott) [04:04:09] PROBLEM - Disk space on labstore1002 is CRITICAL: DISK CRITICAL - /run/lock/storage-replicate-labstore-maps/snapshot is not accessible: Permission denied [04:05:12] Krenair: weren't you doing something similar to https://gerrit.wikimedia.org/r/#/c/175442/ [04:05:40] yuvipanda, I didn't make a jenkins plugin for it [04:05:58] * Krenair digs out the relevant tickets [04:06:20] (03PS2) 10Yuvipanda: dbtree: Remove obsolete .gitignore [software] - 10https://gerrit.wikimedia.org/r/205085 (owner: 10Tim Landscheidt) [04:06:29] (03CR) 10Yuvipanda: [C: 032 V: 032] dbtree: Remove obsolete .gitignore [software] - 10https://gerrit.wikimedia.org/r/205085 (owner: 10Tim Landscheidt) [04:06:49] yuvipanda, https://phabricator.wikimedia.org/T108078#1512371 [04:07:09] PROBLEM - Incoming network saturation on labstore1003 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [100000000.0] [04:07:30] doesn't require ldaplist itself, it just contains a little bit of ldaplist's code [04:08:09] bam, krrrit-wm restarted when it crashed automtaically wooo [04:08:42] so I think I'm going to make a small script that sorts open patchsets by owner [04:09:38] PROBLEM - mailman I/O stats on fermium is CRITICAL: CRITICAL - I/O stats: Transfers/Sec=3537.67 Read Requests/Sec=2799.19 Write Requests/Sec=397.88 KBytes Read/Sec=18667.88 KBytes_Written/Sec=1591.52 [04:14:48] yuvipanda, so did you see https://gerrit.wikimedia.org/r/#/c/239278/ ? [04:15:39] yuvipanda: https://www.mediawiki.org/wiki/Gerrit/Reports/Changesets_by_owner [04:15:56] errr [04:15:57] yuvipanda: https://www.mediawiki.org/wiki/Gerrit/Reports/Open_changesets_by_owner [04:16:00] that one! [04:17:35] legoktm: ah [04:17:44] legoktm: but I want it only for ops repos [04:18:07] Krenair: yeah, can you put that up for puppetswat? [04:18:11] not updated since june.. [04:18:19] RECOVERY - mailman I/O stats on fermium is OK: OK - I/O stats: Transfers/Sec=53.56 Read Requests/Sec=59.78 Write Requests/Sec=0.40 KBytes Read/Sec=239.12 KBytes_Written/Sec=1.60 [04:18:35] if I write this in perl6 [04:18:38] will people hate me? [04:18:42] also you don't need a jenkins plugin for the ldap ssh keys thing [04:18:47] ops puppet just uses tox [04:18:56] so add it to tox.ini and jenkins will run it [04:20:22] yuvipanda, what are you writing? [04:21:15] just a 'give it a gerrit query and it'll spit out a git-shortlog style owners + patches' [04:21:31] and then I'll email the list to the owners in ops repos individually and ask them to either merge them or abandon them [04:21:44] ah [04:21:57] there's a large amount of changes from apergos on the operations/software repository from august that are unmerged, for example [04:23:18] hmm [04:23:21] I'm trying to query gerrit [04:23:29] for list of patches from people who aren't in ldap/ops [04:23:45] legoktm, it does require the ldap library to be installed though [04:23:53] yuvipanda, -ownerin:ldap/ops [04:24:38] whee [04:24:38] yes [04:24:40] I did that [04:24:44] it just took forever [04:24:47] but that fits into one page :D [04:24:51] and a lot of that stuff is WIP [04:25:18] legoktm, does the jenkins host have the ldap python module? [04:26:03] haha [04:26:09] bit.ly thinks that that gerrit URL isn't valid [04:27:48] (03Abandoned) 10Yuvipanda: WIP: Hierator puppetization [puppet] - 10https://gerrit.wikimedia.org/r/202743 (owner: 10MaxSem) [04:29:23] (03Abandoned) 10Yuvipanda: Phabricator: Create diffusion puppet role [puppet] - 10https://gerrit.wikimedia.org/r/222987 (https://phabricator.wikimedia.org/T104827) (owner: 10Negative24) [04:29:56] (03PS4) 10Yuvipanda: phab: Add passwd entries for vcs user [puppet] - 10https://gerrit.wikimedia.org/r/226573 (owner: 10Negative24) [04:30:38] (03CR) 10Yuvipanda: [C: 04-2] "Shouldn't set passwords in puppet." [puppet] - 10https://gerrit.wikimedia.org/r/226573 (owner: 10Negative24) [04:35:09] (03CR) 10Alex Monk: "Please see https://phabricator.wikimedia.org/T108078#1512371" [puppet] - 10https://gerrit.wikimedia.org/r/175442 (owner: 10ArielGlenn) [04:35:30] (03PS2) 10Yuvipanda: Remove libmemcached10 [puppet] - 10https://gerrit.wikimedia.org/r/158023 (owner: 10Reedy) [04:35:45] 6operations, 6Labs, 5Patch-For-Review: audit labs versus production ssh keys - https://phabricator.wikimedia.org/T108078#1652356 (10Krenair) Looks like https://gerrit.wikimedia.org/r/#/c/175442/ did something like this [04:35:55] (03CR) 10Yuvipanda: [V: 032] "Yes, no need to explicitly list it since it's a dependency of php5-memcached and so will get installed anyway." [puppet] - 10https://gerrit.wikimedia.org/r/158023 (owner: 10Reedy) [04:36:20] (03CR) 10Yuvipanda: [C: 032] "Yes, no need to explicitly list it since it's a dependency of php5-memcached and so will get installed anyway." [puppet] - 10https://gerrit.wikimedia.org/r/158023 (owner: 10Reedy) [04:36:43] 6operations, 10Wikimedia-Mailing-lists: @txt.att.net bounce notifications being sent to list admins - https://phabricator.wikimedia.org/T112912#1652357 (10MZMcBride) >>! In T112912#1652035, @Ocaasi wrote: > Just so I don't feel left out, Wikipedia Library list is getting one every few hours. I'm up to 30 of t... [04:37:42] !log l10nupdate@tin ResourceLoader cache refresh completed at Fri Sep 18 04:37:42 UTC 2015 (duration 37m 41s) [04:37:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [04:38:00] yuvipanda, you could do https://gerrit.wikimedia.org/r/#/c/227505/ [04:38:57] (03PS5) 10Yuvipanda: Add python3 script to populate meta_p [software] - 10https://gerrit.wikimedia.org/r/227505 (https://phabricator.wikimedia.org/T107094) (owner: 10Alex Monk) [04:40:16] (03CR) 10Yuvipanda: [C: 032] "Wheeee! We can sanitize this later but it looks like a straight port." [software] - 10https://gerrit.wikimedia.org/r/227505 (https://phabricator.wikimedia.org/T107094) (owner: 10Alex Monk) [04:40:25] (03CR) 10Yuvipanda: [V: 032] Add python3 script to populate meta_p [software] - 10https://gerrit.wikimedia.org/r/227505 (https://phabricator.wikimedia.org/T107094) (owner: 10Alex Monk) [04:40:30] Krenair: done [04:41:17] (03CR) 10Yuvipanda: [C: 04-2] "@Chad: is this still WIP? marking as such based on commit message." [puppet] - 10https://gerrit.wikimedia.org/r/207377 (owner: 10Chad) [04:41:35] yuvipanda, you could point it at a separate database, dump both and compare? [04:43:04] (03CR) 10Yuvipanda: "@apergos can you take a look at this please?" [dumps] - 10https://gerrit.wikimedia.org/r/155080 (https://bugzilla.wikimedia.org/51225) (owner: 10MaxSem) [04:43:14] Krenair: I thought you had already done that? :) [04:43:23] Krenair: but yeah that seems prudent [04:43:33] yuvipanda, I had [04:43:46] but you want to review it further right? [04:44:03] Krenair: well, you can just do it again and dump it and if there's no diffs... [04:48:17] (03CR) 10Yuvipanda: "@Jan - this looks like a good candidate for puppetswat. Think you can put it up for coming Tuesday?" [puppet] - 10https://gerrit.wikimedia.org/r/229426 (https://phabricator.wikimedia.org/T84060) (owner: 10JanZerebecki) [04:48:28] jzerebecki: I responded onhttps://gerrit.wikimedia.org/r/#/c/229426/ [04:49:00] Krenair: what do you think of https://gerrit.wikimedia.org/r/#/c/227079/ [04:49:15] Ugh. [04:49:37] I'd rather not have TS pages on mediawiki.org [04:49:49] (03CR) 10Yuvipanda: "Note that we have no https certificates for wiki.toolserver.org." [puppet] - 10https://gerrit.wikimedia.org/r/227079 (https://phabricator.wikimedia.org/T62220) (owner: 10Nemo bis) [04:50:14] (03CR) 10Yuvipanda: "and toolserver.org itself runs on labs, and isn't handled by this apache rule at all." [puppet] - 10https://gerrit.wikimedia.org/r/227079 (https://phabricator.wikimedia.org/T62220) (owner: 10Nemo bis) [04:50:23] Krenair: that patch isn't going to work as well [04:51:07] (03CR) 10Yuvipanda: "Is this still a thing we care about? It's almost been a month since last activity :)" [puppet] - 10https://gerrit.wikimedia.org/r/227335 (https://phabricator.wikimedia.org/T106619) (owner: 10GWicke) [04:51:48] (03CR) 10Yuvipanda: [C: 04-1] "So this is a full NOP." [puppet] - 10https://gerrit.wikimedia.org/r/227079 (https://phabricator.wikimedia.org/T62220) (owner: 10Nemo bis) [04:52:59] (03Abandoned) 10GWicke: Lower the InitiatingHeapOccupancyPercent from 45% to 35% [puppet] - 10https://gerrit.wikimedia.org/r/227335 (https://phabricator.wikimedia.org/T106619) (owner: 10GWicke) [04:53:02] (03CR) 10Yuvipanda: "*poke* should this be abandoned?" [software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/223202 (owner: 10EBernhardson) [04:53:10] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet). [04:53:10] ebernhardson: should this be abandoned? https://gerrit.wikimedia.org/r/#/c/223202/ [04:53:19] gwicke: thanks [04:53:49] PROBLEM - Unmerged changes on repository puppet on palladium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet). [04:54:21] yuvipanda, also, do we have pymysql and python3 installed on the server where you're planning to run maintain-meta_p.py? [04:54:30] RECOVERY - Incoming network saturation on labstore1003 is OK: OK: Less than 10.00% above the threshold [75000000.0] [04:55:16] yuvipanda, also turns out the code no longer fully works [04:56:02] Krenair: I'm not sure which machine we'll be running it from [04:56:48] (03CR) 10Yuvipanda: [C: 04-1] nagios_common: Delay default evaluation of template() (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/237561 (https://phabricator.wikimedia.org/T111982) (owner: 10Tim Landscheidt) [04:56:53] Krenair: dunno, but can it be installed from pypi? [04:57:00] legoktm, the ldap lib? [04:57:13] yeah [04:57:52] or if its packaged, you can add it to the contint puppet module [04:58:08] (03CR) 10Deskana: "I would take the lack of a response as a "yes". It can always be restored later if it's important. I would abandon it myself now, but I am" [software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/223202 (owner: 10EBernhardson) [04:58:29] think it's from the python-ldap package legoktm [04:59:50] (03Abandoned) 10Yuvipanda: Add statsd reporting plugin [software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/223202 (owner: 10EBernhardson) [05:00:02] Deskana|Away: thanks! [05:00:08] PROBLEM - YARN NodeManager Node-State on analytics1032 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:01:42] 6operations, 10Wikimedia-Mailing-lists: @txt.att.net bounce notifications being sent to list admins - https://phabricator.wikimedia.org/T112912#1652369 (10Tbayer) Has someone tried just calling the number and explaining the situation? (It seems to be a real US phone number with a North Carolina area code, with... [05:01:49] RECOVERY - YARN NodeManager Node-State on analytics1032 is OK: OK: YARN NodeManager analytics1032.eqiad.wmnet:8041 Node-State: RUNNING [05:01:56] legoktm@integration-slave-trusty-1016:~$ python [05:01:56] Python 2.7.6 (default, Jun 22 2015, 17:58:13) [05:01:56] [GCC 4.8.2] on linux2 [05:01:56] Type "help", "copyright", "credits" or "license" for more information. [05:01:56] >>> import ldap [05:01:57] legoktm: SyntaxError: Unexpected reserved word [05:01:58] >>> [05:02:07] shoo, ecmabot-wm [05:02:09] ohai ecmabot-wm [05:02:21] >>> print("This is python!") [05:02:21] legoktm: undefined; Console: 'This is python!' [05:02:50] krmabot-wm [05:05:19] (03PS2) 10Yuvipanda: shinken: Make shinkengen compatible with ldap3 0.9.4.2 [puppet] - 10https://gerrit.wikimedia.org/r/238190 (https://phabricator.wikimedia.org/T101824) (owner: 10Tim Landscheidt) [05:05:32] (03CR) 10Yuvipanda: [C: 032 V: 032] "Thanks for the patch!" [puppet] - 10https://gerrit.wikimedia.org/r/238190 (https://phabricator.wikimedia.org/T101824) (owner: 10Tim Landscheidt) [05:05:55] legoktm, so basically it's installed, k [05:06:18] RECOVERY - Unmerged changes on repository puppet on palladium is OK: No changes to merge. [05:07:28] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [05:10:59] (03PS3) 10Yuvipanda: nagios_common: Delay default evaluation of template() [puppet] - 10https://gerrit.wikimedia.org/r/237561 (https://phabricator.wikimedia.org/T111982) (owner: 10Tim Landscheidt) [05:11:28] (03PS4) 10Yuvipanda: nagios_common: Delay default evaluation of template() [puppet] - 10https://gerrit.wikimedia.org/r/237561 (https://phabricator.wikimedia.org/T111982) (owner: 10Tim Landscheidt) [05:11:36] (03CR) 10Yuvipanda: [C: 032 V: 032] nagios_common: Delay default evaluation of template() [puppet] - 10https://gerrit.wikimedia.org/r/237561 (https://phabricator.wikimedia.org/T111982) (owner: 10Tim Landscheidt) [05:12:24] Krenair: were you going to get ori to review the tcpircbot changes? [05:12:37] Krenair: what were you trying to do with it anyway? [05:12:40] replace ircecho? [05:13:11] yes, something like that [05:13:27] legoktm: btw I just fixed up shinken-01, you should stop getting messages about integration sooon [05:14:02] 6operations, 10Wikimedia-Mailing-lists: @txt.att.net bounce notifications being sent to list admins - https://phabricator.wikimedia.org/T112912#1652383 (10Dzahn) You can go to the following URL, where you have to replace with the name of your list: https://lists.wikimedia.org/mailman/admin/ yuvipanda: yaaaaaaay, thank you :D [05:15:35] legoktm: yw [05:15:37] (03PS4) 10Yuvipanda: Wrap lvs class in has_lvs hiera variable [puppet] - 10https://gerrit.wikimedia.org/r/202611 (https://phabricator.wikimedia.org/T91560) (owner: 10Thcipriani) [05:16:07] yuvipanda, when was maintain-replicas last run? [05:16:13] (03CR) 10Yuvipanda: [C: 032] Wrap lvs class in has_lvs hiera variable [puppet] - 10https://gerrit.wikimedia.org/r/202611 (https://phabricator.wikimedia.org/T91560) (owner: 10Thcipriani) [05:16:27] (03CR) 10Yuvipanda: [V: 032] "Simple rebase." [puppet] - 10https://gerrit.wikimedia.org/r/202611 (https://phabricator.wikimedia.org/T91560) (owner: 10Thcipriani) [05:16:31] Krenair: not sure [05:16:36] Krenair: it's always been a Coren thing [05:17:53] Krenair: do you need to port this to the python one as well? https://gerrit.wikimedia.org/r/#/c/221042/ [05:17:55] Well it seems to think that aawiki's has_echo is 0... [05:18:09] That's already part of it, yuvipanda :) [05:18:16] ah cool :) [05:18:20] Krenair: should we abandon this then? [05:18:24] hmm [05:18:26] probably not? [05:18:37] I've added it to list of patches I want coren to look at [05:18:46] not until I've got rid of all of the meta_p stuff out of maintain-replicas.pl [05:19:01] right [05:19:02] actually I could just turn it into that [05:19:29] I'm seeing old stuff in here (the real labs meta_p) that I changed weeks ago [05:20:05] (03CR) 10Yuvipanda: "Do you still want to make this happen?" [puppet] - 10https://gerrit.wikimedia.org/r/198173 (https://phabricator.wikimedia.org/T91548) (owner: 10Thcipriani) [05:24:34] 6operations, 10Wikimedia-Mailing-lists, 5Patch-For-Review: run /var/lib/mailman/bin/update and ./check_perms - https://phabricator.wikimedia.org/T113020#1652396 (10Dzahn) 3NEW a:3Dzahn [05:27:07] 6operations, 10Wikimedia-Mailing-lists, 5Patch-For-Review: Mailman Upgrade (Jessie & Mailman 2.x) and migration to a VM - https://phabricator.wikimedia.org/T105756#1652405 (10Dzahn) [05:27:58] 6operations, 10Wikimedia-Mailing-lists, 5Patch-For-Review: Mailman Upgrade (Jessie & Mailman 2.x) and migration to a VM - https://phabricator.wikimedia.org/T105756#1450909 (10Dzahn) [05:50:36] (03Abandoned) 10Giuseppe Lavagetto: coredb::s1: use hiera, role [puppet] - 10https://gerrit.wikimedia.org/r/185921 (owner: 10Giuseppe Lavagetto) [06:00:30] (03CR) 10Yuvipanda: [C: 04-2] "Nope. There's already a different role that does all the beta stuff (check any of the other similar instances on beta for what that is). P" [puppet] - 10https://gerrit.wikimedia.org/r/170130 (owner: 10Cscott) [06:15:43] !log elastic in eqiad plugin updates: restarting elastic1012 [06:15:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [06:29:50] PROBLEM - puppet last run on db2044 is CRITICAL: CRITICAL: puppet fail [06:30:29] PROBLEM - puppet last run on cp2013 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:09] PROBLEM - puppet last run on mc2005 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:28] PROBLEM - puppet last run on db1067 is CRITICAL: CRITICAL: Puppet has 2 failures [06:31:39] PROBLEM - puppet last run on cp4010 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:59] PROBLEM - puppet last run on mw2158 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:09] PROBLEM - puppet last run on mw2081 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:19] PROBLEM - puppet last run on mw2073 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:29] PROBLEM - puppet last run on mw1170 is CRITICAL: CRITICAL: Puppet has 2 failures [06:32:40] PROBLEM - puppet last run on mw2018 is CRITICAL: CRITICAL: Puppet has 2 failures [06:32:58] PROBLEM - puppet last run on mw1110 is CRITICAL: CRITICAL: Puppet has 2 failures [06:33:09] PROBLEM - puppet last run on mw2207 is CRITICAL: CRITICAL: Puppet has 1 failures [06:55:49] RECOVERY - puppet last run on mc2005 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [06:56:48] RECOVERY - puppet last run on mw2081 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:56:49] RECOVERY - puppet last run on cp2013 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [06:56:59] RECOVERY - puppet last run on mw2073 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [06:56:59] RECOVERY - puppet last run on mw1170 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [06:57:19] RECOVERY - puppet last run on mw2018 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:28] RECOVERY - puppet last run on mw1110 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:48] RECOVERY - puppet last run on db1067 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:48] RECOVERY - puppet last run on mw2207 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:58] RECOVERY - puppet last run on cp4010 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:59] RECOVERY - puppet last run on db2044 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:58:20] RECOVERY - puppet last run on mw2158 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:01:51] (03CR) 10QChris: "@YuviPanda: On ytterbium." [puppet] - 10https://gerrit.wikimedia.org/r/238976 (owner: 10QChris) [07:06:43] (03CR) 10QChris: "But I guess I should add that Gerrit $SOMETIMES chokes a" [puppet] - 10https://gerrit.wikimedia.org/r/238976 (owner: 10QChris) [07:09:19] (03CR) 10QChris: "@Yuvipanda: It's certainly good for review. But it should" [puppet] - 10https://gerrit.wikimedia.org/r/237753 (https://phabricator.wikimedia.org/T112025) (owner: 10QChris) [07:18:13] (03CR) 10MarcoAurelio: [C: 031] "Display ok on my laptop. I think they're OK. Is optiPNG needed here?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/239308 (https://phabricator.wikimedia.org/T72829) (owner: 10Alex Monk) [07:21:36] !log elastic in eqiad plugin updates: restarting elastic1013 [07:21:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [07:51:18] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 4 below the confidence bounds [07:56:38] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 8 below the confidence bounds [08:03:58] RECOVERY - HTTP error ratio anomaly detection on graphite1001 is OK: OK: No anomaly detected [08:16:28] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 11 data above and 9 below the confidence bounds [08:18:12] RECOVERY - HTTP error ratio anomaly detection on graphite1001 is OK: OK: No anomaly detected [08:22:24] 6operations, 7Database: Better mysql monitoring for number of connections and processlist strange patterns - https://phabricator.wikimedia.org/T112473#1652604 (10jcrespo) I am afraid not, a simplistic, but accurate way of describing the linux load parameters is "the size of the CPU queue". MySQL will very rare... [08:22:36] !log elastic in eqiad plugin updates: restarting elastic1014 [08:22:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:26:59] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 9 below the confidence bounds [08:28:03] (03Abandoned) 10Muehlenhoff: Exclude DNS requests from connection tracking [puppet] - 10https://gerrit.wikimedia.org/r/238447 (https://phabricator.wikimedia.org/T104968) (owner: 10Muehlenhoff) [08:28:50] RECOVERY - HTTP error ratio anomaly detection on graphite1001 is OK: OK: No anomaly detected [08:30:03] (03CR) 10Alexandros Kosiaris: [C: 04-1] "Can't say I am not happy we avoided that perlisms. Switching to -1 as well" [puppet] - 10https://gerrit.wikimedia.org/r/238778 (https://phabricator.wikimedia.org/T112651) (owner: 10Zfilipin) [08:31:14] (03CR) 10Alexandros Kosiaris: "So we will be enforcing puppet style guide on puppet ruby functions ? That's may lead to more weirdness, I fear." [puppet] - 10https://gerrit.wikimedia.org/r/238779 (https://phabricator.wikimedia.org/T112651) (owner: 10Zfilipin) [08:37:24] (03CR) 10Alexandros Kosiaris: [C: 031] "Yes, we decided to have all bastion hosts be equal in terms of functionality. Plus it makes it easier for users." [puppet] - 10https://gerrit.wikimedia.org/r/239023 (owner: 10Dzahn) [08:38:58] (03CR) 10Alexandros Kosiaris: "If we merge this, be careful about it. Logging all hosts requests might be a bit too much to bear" [puppet] - 10https://gerrit.wikimedia.org/r/239009 (https://phabricator.wikimedia.org/T97119) (owner: 10Dzahn) [08:39:00] (03PS1) 10Muehlenhoff: Remove the Graphite/Diamond based conntrack saturation check [puppet] - 10https://gerrit.wikimedia.org/r/239332 [08:39:39] (03CR) 10Alexandros Kosiaris: "I meant for carbon, obviously. Both in terms of Disk space as well as IOPS. I have no estimation however, so let's do it" [puppet] - 10https://gerrit.wikimedia.org/r/239009 (https://phabricator.wikimedia.org/T97119) (owner: 10Dzahn) [08:40:49] (03PS1) 10Ori.livneh: Clean-up a small typo [puppet] - 10https://gerrit.wikimedia.org/r/239333 [08:42:52] (03PS2) 10Ori.livneh: Clean-up a small typo [puppet] - 10https://gerrit.wikimedia.org/r/239333 [08:43:06] (03CR) 10Ori.livneh: [C: 032 V: 032] Clean-up a small typo [puppet] - 10https://gerrit.wikimedia.org/r/239333 (owner: 10Ori.livneh) [09:11:03] (03CR) 10Steinsplitter: "per ticket" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/186916 (https://phabricator.wikimedia.org/T19471) (owner: 10Nemo bis) [09:25:01] (03PS2) 10Filippo Giunchedi: create application users (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/239000 (https://phabricator.wikimedia.org/T92590) (owner: 10Eevans) [09:32:47] (03PS3) 10Filippo Giunchedi: create application users (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/239000 (https://phabricator.wikimedia.org/T92590) (owner: 10Eevans) [09:33:02] !log elastic in eqiad plugin updates: restarting elastic1015 [09:33:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [09:37:02] !log installed openldap security updates on pollux [09:37:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [09:44:06] (03PS1) 10Giuseppe Lavagetto: Add test for ParseCommandLine [debs/pybal] - 10https://gerrit.wikimedia.org/r/239337 [09:46:19] !log installed openldap security updates on plutonium [09:46:23] (03PS1) 10Filippo Giunchedi: utils/pcc: jenkinsapi forward/backward compatibility [puppet] - 10https://gerrit.wikimedia.org/r/239338 [09:46:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [09:57:05] (03PS4) 10Filippo Giunchedi: create application users [puppet] - 10https://gerrit.wikimedia.org/r/239000 (https://phabricator.wikimedia.org/T92590) (owner: 10Eevans) [09:57:21] (03PS5) 10Filippo Giunchedi: create application users [puppet] - 10https://gerrit.wikimedia.org/r/239000 (https://phabricator.wikimedia.org/T92590) (owner: 10Eevans) [09:57:28] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] create application users [puppet] - 10https://gerrit.wikimedia.org/r/239000 (https://phabricator.wikimedia.org/T92590) (owner: 10Eevans) [10:02:20] 6operations, 10MediaWiki-ResourceLoader, 6Performance-Team, 10Traffic: [Research] Investigate 30% load.php reqs increase since 2015-07-30 - https://phabricator.wikimedia.org/T113007#1652730 (10Catrope) This is probably just the equivalent of {T102898} for the mobile VCLs. [10:14:03] (03PS1) 10Filippo Giunchedi: cassandra: provision restbase user/password [puppet] - 10https://gerrit.wikimedia.org/r/239341 (https://phabricator.wikimedia.org/T92590) [10:23:04] unless there are objections, I will restart the salt master in 5 minutes [10:28:33] !log restarted salt-master on palladium [10:28:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:30:20] (03PS1) 10Alexandros Kosiaris: Revert "Temporarily disable import_waterlines cronjob" [puppet] - 10https://gerrit.wikimedia.org/r/239344 [10:31:07] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] Revert "Temporarily disable import_waterlines cronjob" [puppet] - 10https://gerrit.wikimedia.org/r/239344 (owner: 10Alexandros Kosiaris) [10:31:22] (03CR) 10Alexandros Kosiaris: "reverted in https://gerrit.wikimedia.org/r/#/c/239344/1" [puppet] - 10https://gerrit.wikimedia.org/r/238993 (owner: 10MaxSem) [10:35:21] (03CR) 10Alexandros Kosiaris: [C: 031] Rename analytics1011, 1016, and 1019 to aqs1001, 1002, 1003 [puppet] - 10https://gerrit.wikimedia.org/r/239177 (https://phabricator.wikimedia.org/T111053) (owner: 10Ottomata) [10:35:24] 6operations: audit all SSL certificates expiry on ops tracking gcal - https://phabricator.wikimedia.org/T112542#1652776 (10fgiunchedi) related but slightly tangent to this, we have also other private material that's bound to expire (e.g. puppet CA, gpg keyrings for apt repos, certs for cassandra server/client au... [11:00:01] 6operations, 10MediaWiki-ResourceLoader, 6Performance-Team, 10Traffic: [Research] Investigate 30% load.php reqs increase since 2015-07-30 - https://phabricator.wikimedia.org/T113007#1652791 (10Catrope) I tried to see if I could simply apply https://gerrit.wikimedia.org/r/#/c/219101/7/templates/varnish/text... [11:06:07] !log elastic in eqiad plugin updates: restarting elastic1016 [11:06:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [11:37:09] PROBLEM - mailman I/O stats on fermium is CRITICAL: CRITICAL - I/O stats: Transfers/Sec=1079.67 Read Requests/Sec=357.43 Write Requests/Sec=0.81 KBytes Read/Sec=1429.73 KBytes_Written/Sec=3.24 [11:40:39] RECOVERY - mailman I/O stats on fermium is OK: OK - I/O stats: Transfers/Sec=4.00 Read Requests/Sec=0.00 Write Requests/Sec=0.90 KBytes Read/Sec=0.00 KBytes_Written/Sec=4.00 [11:55:02] !log elastic in eqiad plugin updates: restarting elastic1017 [11:55:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [11:58:12] (03CR) 10Alexandros Kosiaris: [C: 032] Rename analytics nodes to aqs (analytics query service), put them in private1 vlans [dns] - 10https://gerrit.wikimedia.org/r/239175 (https://phabricator.wikimedia.org/T111053) (owner: 10Ottomata) [11:58:42] (03CR) 10Alexandros Kosiaris: [C: 031] "+1, feel free to merge at your leisure" [dns] - 10https://gerrit.wikimedia.org/r/239175 (https://phabricator.wikimedia.org/T111053) (owner: 10Ottomata) [11:59:45] 6operations, 10hardware-requests, 5Patch-For-Review: Request three servers for Pageview API - https://phabricator.wikimedia.org/T111053#1652869 (10akosiaris) >>! In T111053#1650612, @Ottomata wrote: > Ok! > > https://gerrit.wikimedia.org/r/#/c/239175/ > https://gerrit.wikimedia.org/r/#/c/239177/ LGTM > an... [12:02:02] (03CR) 10Alexandros Kosiaris: "This is not really needed, but I am intrigued to do it on the grounds of "let's see what happens if we change a services's port and what w" [puppet] - 10https://gerrit.wikimedia.org/r/238399 (owner: 10Yurik) [12:02:10] (03PS2) 10Alexandros Kosiaris: Updated Kartotherian & Tilerator ports [puppet] - 10https://gerrit.wikimedia.org/r/238399 (owner: 10Yurik) [12:03:12] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] Updated Kartotherian & Tilerator ports [puppet] - 10https://gerrit.wikimedia.org/r/238399 (owner: 10Yurik) [12:12:48] (03CR) 10Glaisher: noindex user namespace on en.wikipedia.org (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/237330 (https://phabricator.wikimedia.org/T104797) (owner: 10Mdann52) [12:21:40] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL: CRITICAL: 28.57% of data above the critical threshold [500.0] [12:21:52] !log restart logstash on logstash1001, OOM in logs [12:21:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:21:59] RECOVERY - logstash process on logstash1001 is OK: PROCS OK: 1 process with UID = 998 (logstash), command name java, args logstash [12:22:15] ugh, could be related? [12:22:31] looking [12:26:18] (03CR) 10Daniel Kinzler: [C: 031] "we want this, but i have no idea whether this is the correct way to do it" [puppet] - 10https://gerrit.wikimedia.org/r/230483 (https://phabricator.wikimedia.org/T97195) (owner: 10Smalyshev) [12:30:15] (03CR) 10JanZerebecki: "Ok, signed up." [puppet] - 10https://gerrit.wikimedia.org/r/229426 (https://phabricator.wikimedia.org/T84060) (owner: 10JanZerebecki) [12:32:44] not related btw, seems mostly from https://tools.wmflabs.org/para/Commons:Special:NewFiles [12:36:33] Krenair: you online? [12:37:13] (03CR) 10Alex Monk: "Er.. good point." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/239308 (https://phabricator.wikimedia.org/T72829) (owner: 10Alex Monk) [12:37:19] Steinsplitter, hi [12:37:20] yep [12:37:58] Krenair: https://commons.wikimedia.org/w/index.php?title=Commons_talk:Welcome_log&action=history you know from where the titleblacklist reports coming? [12:38:03] (03CR) 10Alex Monk: "w/static/images/sul/commons.png is already optimized." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/239308 (https://phabricator.wikimedia.org/T72829) (owner: 10Alex Monk) [12:39:21] Steinsplitter, umm... I actually have no idea where those are from [12:40:49] Isn't that from the titleblacklist? [12:41:11] https://commons.wikimedia.org/wiki/MediaWiki:Titleblacklist [12:41:25] There are many entries with the same errmsg [12:41:28] yes, but the seemingly automatic reporting? [12:41:54] and i am wondering why they are reported on Commons talk:Welcome log [12:42:01] Someone has something on an interface message [12:42:14] It's done for abusefilter entries on many wikis [12:42:20] but I don't know about titleblacklist [12:42:45] Does TitleBlacklist do this reporting automatically? [12:43:32] no, it doesn't afaik [12:43:58] https://commons.wikimedia.org/w/index.php?title=Special%3AWhatLinksHere&target=Commons+talk%3AWelcome+log&namespace=10 [12:44:07] https://commons.wikimedia.org/w/index.php?title=Special%3AWhatLinksHere&target=Commons+talk%3AWelcome+log&namespace=8 [12:44:19] So maybe it's some sort of gadget? [12:44:26] But probably an externallink with new section stuff [12:44:45] i checked them. i can't find from where it is coming [12:45:15] Think I'm on to something in the logs [12:45:36] (03Abandoned) 10Alexandros Kosiaris: maps: Setup LVS for public entrypoint [puppet] - 10https://gerrit.wikimedia.org/r/225716 (owner: 10Alexandros Kosiaris) [12:46:11] HotCat queries info about this page? [12:46:22] RECOVERY - HTTP 5xx req/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [12:55:31] hm, so it is likely a third party tool/gadget. Thanks for helping Krenair and Glaisher (will try now to find out which one :-))! [12:55:52] Not necessarily, but I can't think of any extension that does this [12:57:53] UploadWizard [12:57:53] https://github.com/wikimedia/mediawiki-extensions-UploadWizard/blob/master/i18n/en.json#L316 [12:58:28] Reedy: Thanks! :) [12:59:12] 6operations, 6Commons: delete gwtoolset job by Hansmuller - https://phabricator.wikimedia.org/T112878#1652997 (10Hansmuller) Thank you cancelling my job! I just panicked. In the future i will keep cool an keep hecking things beforehand. I hope this incident will not affect my ability to upload speedily. Than... [12:59:29] (03Abandoned) 10Thcipriani: Allow override of sync_common config [puppet] - 10https://gerrit.wikimedia.org/r/198173 (https://phabricator.wikimedia.org/T91548) (owner: 10Thcipriani) [12:59:32] I was thinking it was probably was UW, and after mwgrep showed nothing I had a look [12:59:33] xD [13:01:09] interesting [13:01:55] But why that welcome log page? [13:02:14] (03CR) 10Alexandros Kosiaris: "This turned out to be:" [puppet] - 10https://gerrit.wikimedia.org/r/238399 (owner: 10Yurik) [13:02:15] it's using mw.Feedback, with new mw.Title( mw.UploadWizard.config.blacklistIssuesPage ) [13:02:28] git blame and ask? [13:02:30] wmf-config/CommonSettings.php: $wgUploadWizardConfig['blacklistIssuesPage'] = 'Commons:Upload_Wizard_blacklist_issues'; # Set by neilk, 2011-11-01, per erik [13:02:43] Is editing brokne? [13:02:45] aude: ^ [13:03:24] hoo, umm... I see edits going through on enwiki [13:03:36] I also see some on Wikidata, but extremely few [13:03:41] and other wikis [13:05:21] what's wrong? [13:05:26] Steinsplitter, UploadWizard should be sending those to Commons:Upload_Wizard_blacklist_issues [13:05:33] abusefilter ? [13:05:36] Krenair: Exactly. [13:05:44] PROBLEM - Outgoing network saturation on labstore1003 is CRITICAL: CRITICAL: 13.64% of data above the critical threshold [100000000.0] [13:05:49] aude: Very few edits... suspiciously few [13:05:55] on commons? [13:06:02] Wikidata [13:06:10] oh [13:06:16] can we do a backport today? [13:06:21] * aude suspects we have a fix [13:06:29] We had 8 edits/ minute [13:06:35] but seems back to normal now [13:06:37] https://phabricator.wikimedia.org/T112070 [13:06:59] aude: That breaks editing? [13:07:31] let me look [13:07:47] it's just a guess and something safe / probably worth to backport [13:08:25] Yeah, it should be safe [13:09:47] i am not sure that' sthe isue though [13:10:23] I looked into the logs very quick and can't see anything [13:10:47] * aude sees one entry for autocomment [13:10:47] Maybe it's just an extremely slow day? [13:11:06] (03PS1) 10JanZerebecki: Make link in dataset relative [puppet] - 10https://gerrit.wikimedia.org/r/239367 (https://phabricator.wikimedia.org/T112892) [13:12:41] i am preparing a backport [13:12:45] !log elastic in eqiad plugin updates: restarting elastic1018 [13:12:52] it will also add some debug logging for when we have to unstub [13:12:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:13:10] in previous cases we had such issue, abuse filter was the problem [13:13:53] Krenair: You know where to fix this? [13:14:12] Steinsplitter, I'm not sure why it's happening [13:14:35] okay, i file a report on phab. thans for helping. [13:14:54] * aude not totally clear what the problem is [13:15:05] explaining on phabricator would help [13:15:18] (03CR) 10Hoo man: "Had a quick look, please double check" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/239367 (https://phabricator.wikimedia.org/T112892) (owner: 10JanZerebecki) [13:16:06] !log commiting eqiad lvs1007-12 port/vlan changes for asw2-a5-eqiad [13:16:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:16:54] akosiaris: i just googled a tiny bit about what disk layout is good for cassandra [13:17:05] i don't know much here, but i guess either jbod or raid 0? [13:17:13] aude: https://www.wikidata.org/wiki/Special:AbuseLog doesn't seem more than usual to me [13:17:26] * hoo looks in for private filters [13:17:31] hoo: the problem is blacklist error reports? [13:17:39] * logs in/ looks into [13:17:49] (03CR) 10JanZerebecki: Make link in dataset relative (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/239367 (https://phabricator.wikimedia.org/T112892) (owner: 10JanZerebecki) [13:17:50] ottomata: Is it read-heavy or write-heavy as a rule? [13:17:51] huh? [13:18:08] hoo: i don't understand what problem you are referring to [13:18:12] ottomata: Because if read-heavy you can win a lot with raid 10 - especially if you can thow disks at it. [13:18:19] Coren: I don't know, i think it will take mostly batch writes, and then regular reads [13:18:25] but the autocomment thing is worth to backport anyway and safe [13:18:33] Coren: the 3 nodes we are using each have 12 disks [13:18:55] raid10 would be nice, just for the simplicity of recovery on disk failure [13:19:06] but, i suppose cassandra makes that easy anyway? [13:19:07] (03CR) 10Hoo man: [C: 031] Make link in dataset relative (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/239367 (https://phabricator.wikimedia.org/T112892) (owner: 10JanZerebecki) [13:19:33] ottomata: Well, raid10 means there is no /need/ for recovery for single-disk failiures. [13:19:36] aude: Well, I just think that 8 edits/ minute seem extremely low to me [13:19:47] thus I suspected that we had problems [13:19:53] ottomata: But the biggest advantage of the mirrors is that it nearly doubles your read bandwidth. [13:20:01] But it might just be a very slow day [13:20:11] !log committing eqiad lvs1007-12 port/vlan changes for asw-c-eqiad [13:20:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:20:45] hoo: ok [13:20:53] nothing to do with autocomments [13:20:59] * aude might save that to monday [13:21:17] hm, ya, i dunno, ok! raid 10 is fine with me! :) someone can object in review :) [13:24:42] ottomata: how much disk space is there on these hosts ? [13:25:17] (03CR) 10Filippo Giunchedi: [C: 04-1] "the same "%{passwords::}" syntax is also used in" [puppet] - 10https://gerrit.wikimedia.org/r/239341 (https://phabricator.wikimedia.org/T92590) (owner: 10Filippo Giunchedi) [13:25:26] akosiaris: 1.2T x 12 disks [13:25:55] hoo: no problem editing (e.g. wikidata game or in the ui) [13:25:56] ah sorry [13:25:57] no [13:26:00] and my edits appear [13:26:01] 2T x 12 disks [13:26:07] so effectively 12TB per host [13:26:13] with RAID10 that is [13:26:15] yeah [13:26:23] akosiaris: the other restbase servers do raid 0 [13:26:25] but that is on ssds [13:26:49] aude: mh... in that case it's probably just a very slow day :D [13:26:55] Everyone's traveling [13:26:58] :P [13:27:00] :( [13:27:05] wikicon [13:27:15] all the editors are comign to wikicon! [13:27:24] RECOVERY - Outgoing network saturation on labstore1003 is OK: OK: Less than 10.00% above the threshold [75000000.0] [13:27:25] ottomata: yeah, due to cassandra replication, there is no need for RAID level != RAID0 [13:28:00] aye [13:28:07] ok so, with 10Gb input per day, that should suffice for like 3 years [13:28:20] by which we expect to have replaced those boxes [13:28:27] actually 2 years before that time limit [13:28:58] so even with twice or thrice that per day we should be ok [13:29:30] k [13:29:33] ottomata: I think it should be fine for now. worst case scenario, we depool one host and reinstall it [13:29:41] so raid 10 lvm on /var [13:29:49] ok [13:31:07] (03PS1) 10Alexandros Kosiaris: WIP: modularize otrs [puppet] - 10https://gerrit.wikimedia.org/r/239369 [13:32:02] (03CR) 10jenkins-bot: [V: 04-1] WIP: modularize otrs [puppet] - 10https://gerrit.wikimedia.org/r/239369 (owner: 10Alexandros Kosiaris) [13:35:05] PROBLEM - puppet last run on nescio is CRITICAL: CRITICAL: puppet fail [13:36:05] ^ checking [13:36:54] RECOVERY - puppet last run on nescio is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [13:36:55] just random puppetmasterfail [13:37:14] (03PS1) 10Ottomata: Add cassandrahosts-12hdd.cfg partman recipe and test on d-i-test [puppet] - 10https://gerrit.wikimedia.org/r/239371 [13:37:51] 6operations, 6Analytics-Kanban, 10hardware-requests, 5Patch-For-Review: Request three servers for Pageview API - https://phabricator.wikimedia.org/T111053#1653138 (10Ottomata) [13:38:07] (03PS2) 10Ottomata: Add cassandrahosts-12hdd.cfg partman recipe and test on d-i-test [puppet] - 10https://gerrit.wikimedia.org/r/239371 (https://phabricator.wikimedia.org/T111053) [13:38:19] 6operations, 10Wikimedia-Mailing-lists: @txt.att.net bounce notifications being sent to list admins - https://phabricator.wikimedia.org/T112912#1653145 (10Multichill) This is getting annoying. Can someone just patch this up to get rid of this junk? [13:38:30] 6operations, 10Wikimedia-Mailing-lists: @txt.att.net bounce notifications being sent to list admins - https://phabricator.wikimedia.org/T112912#1653147 (10Multichill) p:5Normal>3High [13:39:19] (03PS3) 10Ottomata: Add cassandrahosts-12hdd.cfg partman recipe and test on d-i-test [puppet] - 10https://gerrit.wikimedia.org/r/239371 (https://phabricator.wikimedia.org/T111053) [13:39:48] (03CR) 10Ottomata: [C: 032 V: 032] Add cassandrahosts-12hdd.cfg partman recipe and test on d-i-test [puppet] - 10https://gerrit.wikimedia.org/r/239371 (https://phabricator.wikimedia.org/T111053) (owner: 10Ottomata) [13:40:59] 6operations, 10Wikimedia-Mailing-lists: @txt.att.net bounce notifications being sent to list admins - https://phabricator.wikimedia.org/T112912#1653157 (10Dzahn) @Multichill Did you try what i suggested in my comment right above? [13:47:04] PROBLEM - Restbase endpoints health on xenon is CRITICAL: Generic error: Generic connection error: HTTPConnectionPool(host=127.0.0.1, port=7231): Max retries exceeded with url: /en.wikipedia.org/v1/?spec (Caused by ProtocolError(Connection aborted., error(111, Connection refused))) [13:47:45] PROBLEM - Restbase root url on xenon is CRITICAL: Connection refused [13:51:17] PARTMAN IS DIFFICULT [13:51:40] i think i cannot write a raid recipe that uses different numbers of drives for different arrays [13:51:52] akosiaris: ^? [13:52:30] i think i have tried and failed at this before. Gah [13:52:46] <_joe_> ottomata: yeah I was admired by your partman-fu earlier [13:54:24] *I* have partman-fu? [13:54:28] you must be mistaken. [13:54:29] hah [13:54:41] or you mean my attempt just now? :p [13:54:55] <_joe_> no I mean, I wouldn't dare try to do something like that with partman [13:54:58] to achieve partman-fu, one must leave sanity so far behind that you forget what the word means. [13:55:00] <_joe_> I meant just now :P [13:55:44] oh ja, i am giving up. [13:55:58] i will just script somethign up that can be copy/pasted from wikitech [13:56:04] like i do for analytics* and kafka* nodes [13:56:35] !log committing eqiad lvs1007-1012 port/vlan changes for asw-b-eqiad [13:56:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:56:43] RECOVERY - Restbase root url on xenon is OK: HTTP OK: HTTP/1.1 200 - 15229 bytes in 0.021 second response time [13:57:09] ottomata: you have two spares in the raid1 on / btw [13:57:20] and only two disks, that might be it too [13:57:44] RECOVERY - Restbase endpoints health on xenon is OK: All endpoints are healthy [13:59:12] the author of partman admits that himself in a recent blog posting: http://joeyh.name/blog/entry/then_and_now/ :-) [13:59:12] hm, maybe godog [13:59:15] Partman is in some ways a beautful piece of work, a mass of semi-object-oriented, super extensible shell code that sprang fully formed from the brow of Anton. And in many ways, it's mad, full of sector alignment twiddling math implemented in tens of thousands of lines of shell script scattered amoung hundreds of tiny files that are impossible to keep straight. [13:59:38] mutante: hi [13:59:45] might as well try [14:00:23] PROBLEM - Host mw1126 is DOWN: PING CRITICAL - Packet loss = 100% [14:00:23] PROBLEM - Host mw1117 is DOWN: PING CRITICAL - Packet loss = 100% [14:00:26] paravoid: :) hi, just scheduled downtime for sodium [14:00:27] fuck me [14:00:33] PROBLEM - Host mw1120 is DOWN: PING CRITICAL - Packet loss = 100% [14:00:33] PROBLEM - Host mw1114 is DOWN: PING CRITICAL - Packet loss = 100% [14:00:33] PROBLEM - Host mw1115 is DOWN: PING CRITICAL - Packet loss = 100% [14:00:34] PROBLEM - Host mw1124 is DOWN: PING CRITICAL - Packet loss = 100% [14:00:34] PROBLEM - Host mw1118 is DOWN: PING CRITICAL - Packet loss = 100% [14:00:34] PROBLEM - Host mw1119 is DOWN: PING CRITICAL - Packet loss = 100% [14:00:34] PROBLEM - Host mw1107 is DOWN: PING CRITICAL - Packet loss = 100% [14:00:35] PROBLEM - Host mw1101 is DOWN: PING CRITICAL - Packet loss = 100% [14:00:35] PROBLEM - Host mw1108 is DOWN: PING CRITICAL - Packet loss = 100% [14:00:36] PROBLEM - Host mw1127 is DOWN: PING CRITICAL - Packet loss = 100% [14:00:36] PROBLEM - Host mw1110 is DOWN: PING CRITICAL - Packet loss = 100% [14:00:37] PROBLEM - Host mw1103 is DOWN: PING CRITICAL - Packet loss = 100% [14:00:37] PROBLEM - Host mw1098 is DOWN: PING CRITICAL - Packet loss = 100% [14:00:38] PROBLEM - Host mw1128 is DOWN: PING CRITICAL - Packet loss = 100% [14:00:50] bblack: you I guess? :) [14:00:55] PROBLEM - Host mw1100 is DOWN: PING CRITICAL - Packet loss = 100% [14:01:01] !log rollback on asw-b-eqiad changes above [14:01:04] PROBLEM - Host mw1123 is DOWN: PING CRITICAL - Packet loss = 100% [14:01:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:02:04] RECOVERY - Host mw1097 is UP: PING OK - Packet loss = 0%, RTA = 0.23 ms [14:02:04] RECOVERY - Host mw1109 is UP: PING OK - Packet loss = 0%, RTA = 0.82 ms [14:02:04] RECOVERY - Host mw1111 is UP: PING OK - Packet loss = 0%, RTA = 1.02 ms [14:02:04] RECOVERY - Host mw1098 is UP: PING OK - Packet loss = 0%, RTA = 1.12 ms [14:02:04] RECOVERY - Host mw1107 is UP: PING OK - Packet loss = 0%, RTA = 1.79 ms [14:02:05] RECOVERY - Host mw1113 is UP: PING OK - Packet loss = 0%, RTA = 1.09 ms [14:02:05] RECOVERY - Host mw1118 is UP: PING OK - Packet loss = 0%, RTA = 0.44 ms [14:02:06] RECOVERY - Host mw1122 is UP: PING OK - Packet loss = 0%, RTA = 0.49 ms [14:02:06] RECOVERY - Host mw1114 is UP: PING OK - Packet loss = 0%, RTA = 1.77 ms [14:02:07] RECOVERY - Host mw1125 is UP: PING OK - Packet loss = 0%, RTA = 0.87 ms [14:02:07] RECOVERY - Host mw1115 is UP: PING OK - Packet loss = 0%, RTA = 0.50 ms [14:02:08] RECOVERY - Host mw1119 is UP: PING OK - Packet loss = 0%, RTA = 5.33 ms [14:02:10] RECOVERY - Host mw1127 is UP: PING OK - Packet loss = 0%, RTA = 0.34 ms [14:02:10] RECOVERY - Host mw1108 is UP: PING OK - Packet loss = 0%, RTA = 3.52 ms [14:02:23] RECOVERY - Host mw1120 is UP: PING OK - Packet loss = 0%, RTA = 0.89 ms [14:02:24] RECOVERY - Host mw1126 is UP: PING OK - Packet loss = 0%, RTA = 2.22 ms [14:02:24] RECOVERY - Host mw1110 is UP: PING OK - Packet loss = 0%, RTA = 1.47 ms [14:02:24] RECOVERY - Host mw1112 is UP: PING OK - Packet loss = 0%, RTA = 2.60 ms [14:02:43] (03PS1) 10Ottomata: Try again with cassandrahosts-12hdd.cfg [puppet] - 10https://gerrit.wikimedia.org/r/239376 (https://phabricator.wikimedia.org/T111053) [14:03:23] (03CR) 10Ottomata: [C: 032 V: 032] Try again with cassandrahosts-12hdd.cfg [puppet] - 10https://gerrit.wikimedia.org/r/239376 (https://phabricator.wikimedia.org/T111053) (owner: 10Ottomata) [14:04:12] moritzm: any efforts to make something better on the horizon? [14:04:18] (than partman?) [14:04:23] (03PS2) 10Filippo Giunchedi: cassandra: provision restbase user/password [puppet] - 10https://gerrit.wikimedia.org/r/239341 (https://phabricator.wikimedia.org/T92590) [14:05:45] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL: CRITICAL: 16.67% of data above the critical threshold [500.0] [14:05:54] PROBLEM - puppet last run on mw1098 is CRITICAL: CRITICAL: puppet fail [14:06:05] ottomata: I'm afraid not, d-i development is mostly in maintenance mode [14:06:10] (03PS1) 10coren: webservicemonitor: some improvements [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/239377 (https://phabricator.wikimedia.org/T109362) [14:06:29] (03CR) 10jenkins-bot: [V: 04-1] webservicemonitor: some improvements [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/239377 (https://phabricator.wikimedia.org/T109362) (owner: 10coren) [14:07:04] PROBLEM - puppet last run on mw1109 is CRITICAL: CRITICAL: puppet fail [14:07:19] !log stopped mailman on sodium [14:07:24] PROBLEM - puppet last run on mw1097 is CRITICAL: CRITICAL: puppet fail [14:07:26] the mw* puppetfails are just delayed fallout from the switch stuff earlier, they'll self-recover [14:07:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:07:44] PROBLEM - puppet last run on mw1102 is CRITICAL: CRITICAL: Puppet has 19 failures [14:08:26] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] "including ::passwords::cassandra as suggested by _joe_ helped!" [puppet] - 10https://gerrit.wikimedia.org/r/239341 (https://phabricator.wikimedia.org/T92590) (owner: 10Filippo Giunchedi) [14:08:34] ... seriously? "E225 missing whitespace around operator" [14:08:44] (03PS3) 10Filippo Giunchedi: cassandra: provision restbase user/password [puppet] - 10https://gerrit.wikimedia.org/r/239341 (https://phabricator.wikimedia.org/T92590) [14:09:03] * Coren mumbles evil thing about blind adherence to context-free style guidelines. [14:09:19] (03CR) 10Filippo Giunchedi: [V: 032] cassandra: provision restbase user/password [puppet] - 10https://gerrit.wikimedia.org/r/239341 (https://phabricator.wikimedia.org/T92590) (owner: 10Filippo Giunchedi) [14:09:32] (03PS2) 10Ottomata: Rename analytics1011, 1016, and 1019 to aqs1001, 1002, 1003 [puppet] - 10https://gerrit.wikimedia.org/r/239177 (https://phabricator.wikimedia.org/T111053) [14:09:44] (03PS2) 10coren: webservicemonitor: some improvements [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/239377 (https://phabricator.wikimedia.org/T109362) [14:10:20] !log elastic in eqiad plugin updates: restarting elastic1019 [14:10:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:10:55] (03PS3) 10Ottomata: Rename analytics1011, 1016, and 1019 to aqs1001, 1002, 1003 [puppet] - 10https://gerrit.wikimedia.org/r/239177 (https://phabricator.wikimedia.org/T111053) [14:11:13] !log sodium - stopped exim - rsyncing lists to fermium [14:11:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:11:42] (03CR) 10Ottomata: [C: 032] Rename analytics nodes to aqs (analytics query service), put them in private1 vlans [dns] - 10https://gerrit.wikimedia.org/r/239175 (https://phabricator.wikimedia.org/T111053) (owner: 10Ottomata) [14:12:35] (03PS3) 10coren: webservicemonitor: some improvements [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/239377 (https://phabricator.wikimedia.org/T109362) [14:12:50] (03PS4) 10Ottomata: Rename analytics1011, 1016, and 1019 to aqs1001, 1002, 1003 [puppet] - 10https://gerrit.wikimedia.org/r/239177 (https://phabricator.wikimedia.org/T111053) [14:14:24] PROBLEM - mailman I/O stats on fermium is CRITICAL: CRITICAL - I/O stats: Transfers/Sec=961.38 Read Requests/Sec=1182.60 Write Requests/Sec=2.52 KBytes Read/Sec=4732.39 KBytes_Written/Sec=12.07 [14:14:37] !log committing lvs1007-12 port/vlan changes for asw-b-eqiad, round 3... [14:14:41] (03CR) 10Ottomata: [C: 032 V: 032] Rename analytics1011, 1016, and 1019 to aqs1001, 1002, 1003 [puppet] - 10https://gerrit.wikimedia.org/r/239177 (https://phabricator.wikimedia.org/T111053) (owner: 10Ottomata) [14:14:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:14:54] ok akosiaris [14:15:03] I will partition these manually after reinstall [14:15:09] i've merged both patches [14:15:15] and done authdns-update [14:15:27] i think you have to flip some switch switches now? [14:18:33] RECOVERY - HTTP 5xx req/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [14:18:50] (03CR) 10Giuseppe Lavagetto: "If you want to change the 'cluster' variable in puppet, you will need to:" [puppet] - 10https://gerrit.wikimedia.org/r/238431 (https://phabricator.wikimedia.org/T112644) (owner: 10Filippo Giunchedi) [14:19:45] RECOVERY - mailman I/O stats on fermium is OK: OK - I/O stats: Transfers/Sec=0.30 Read Requests/Sec=0.00 Write Requests/Sec=4.80 KBytes Read/Sec=0.00 KBytes_Written/Sec=21.18 [14:22:38] !log committing lvs1007-1012 port/vlan changes for asw-d-eqiad (but leaving all 6 LVS ports in "disabled" state - T112781 ) [14:22:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:25:14] PROBLEM - mailman I/O stats on fermium is CRITICAL: CRITICAL - I/O stats: Transfers/Sec=849.59 Read Requests/Sec=2074.74 Write Requests/Sec=5.83 KBytes Read/Sec=9401.23 KBytes_Written/Sec=1153.37 [14:27:03] RECOVERY - mailman I/O stats on fermium is OK: OK - I/O stats: Transfers/Sec=1.20 Read Requests/Sec=8.92 Write Requests/Sec=0.50 KBytes Read/Sec=42.08 KBytes_Written/Sec=2.00 [14:29:24] RECOVERY - puppet last run on mw1102 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [14:30:01] 6operations, 6Analytics-Kanban, 10hardware-requests, 5Patch-For-Review: Request three servers for Pageview API - https://phabricator.wikimedia.org/T111053#1653269 (10Ottomata) Ok, I will have to manually partition these, partman is too dumb. Alex, proceed with VLAN changes! Then we can reinstall. [14:31:23] RECOVERY - puppet last run on mw1098 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:32:25] RECOVERY - puppet last run on mw1109 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [14:32:53] RECOVERY - puppet last run on mw1097 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:34:25] PROBLEM - mailman I/O stats on fermium is CRITICAL: CRITICAL - I/O stats: Transfers/Sec=1813.16 Read Requests/Sec=2390.63 Write Requests/Sec=3.75 KBytes Read/Sec=9573.78 KBytes_Written/Sec=16.65 [14:35:59] 6operations, 10Wikimedia-Mailing-lists: @txt.att.net bounce notifications being sent to list admins - https://phabricator.wikimedia.org/T112912#1653277 (10Multichill) >>! In T112912#1653157, @Dzahn wrote: > @Multichill Did you try what i suggested in my comment right above? I'm admin for quite a few lists. An... [14:36:11] 6operations, 6Discovery, 5codfw-rollout: Set up a CirrusSearch cluster in codfw (Dallas, Texas) - https://phabricator.wikimedia.org/T105703#1449703 (10chasemp) [14:36:12] 6operations, 6Discovery: Rollout CirrusSearch to codfw as a backup data centre - https://phabricator.wikimedia.org/T105711#1653280 (10chasemp) 5Open>3Invalid a:3chasemp I can't think of anything this task would relate to specifically which we are not tracking elsewhere. [14:38:41] 6operations, 6Discovery, 5codfw-rollout: [EPIC] Set up a CirrusSearch cluster in codfw (Dallas, Texas) - https://phabricator.wikimedia.org/T105703#1653286 (10chasemp) p:5High>3Normal [14:39:18] 6operations, 10CirrusSearch, 6Discovery: Only use newer (elastic10{16..31}) servers as master capable elasticsearch nodes - https://phabricator.wikimedia.org/T112556#1653290 (10chasemp) [14:39:39] 6operations, 10Wikimedia-Mailing-lists: @txt.att.net bounce notifications being sent to list admins - https://phabricator.wikimedia.org/T112912#1653293 (10JohnLewis) p:5High>3Normal a:3JohnLewis [14:39:41] 6operations, 10RESTBase-Cassandra, 5Patch-For-Review: consider moving Cassandra to G1GC in production - https://phabricator.wikimedia.org/T103161#1653294 (10fgiunchedi) 5Open>3Resolved we're running g1gc everywhere [14:40:17] 6operations, 10CirrusSearch, 6Discovery: Only use newer (elastic10{16..31}) servers as master capable elasticsearch nodes - https://phabricator.wikimedia.org/T112556#1653296 (10chasemp) a:5chasemp>3None I am not going to get to this within the week or even next week. Placing up for grabs to reflect that... [14:43:10] (03PS1) 10BBlack: fixup lvs1007-12 eth0 vlan ids [puppet] - 10https://gerrit.wikimedia.org/r/239378 (https://phabricator.wikimedia.org/T104458) [14:43:18] 6operations, 10Wikimedia-Mailing-lists: @txt.att.net bounce notifications being sent to list admins - https://phabricator.wikimedia.org/T112912#1653302 (10JohnLewis) First comment for the sake of everyone: please can people stop commenting things that add no value or enhance the ability for us to look into the... [14:45:06] (03CR) 10BBlack: [C: 032] fixup lvs1007-12 eth0 vlan ids [puppet] - 10https://gerrit.wikimedia.org/r/239378 (https://phabricator.wikimedia.org/T104458) (owner: 10BBlack) [14:47:51] (03PS1) 10Dzahn: mailman: fix path to qfiles [puppet] - 10https://gerrit.wikimedia.org/r/239380 [14:55:46] (03PS2) 10Dzahn: mailman: fix path to qfiles [puppet] - 10https://gerrit.wikimedia.org/r/239380 [14:55:57] !log elastic in eqiad plugin updates: restarting elastic1020 [14:56:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:56:17] (03CR) 10Dzahn: [C: 032] mailman: fix path to qfiles [puppet] - 10https://gerrit.wikimedia.org/r/239380 (owner: 10Dzahn) [15:00:03] 6operations, 10Wikimedia-Mailing-lists, 5Patch-For-Review: Mailman Upgrade (Jessie & Mailman 2.x) and migration to a VM - https://phabricator.wikimedia.org/T105756#1653327 (10Dzahn) [15:00:04] 6operations, 10Wikimedia-Mailing-lists: shut down mailman on sodium - https://phabricator.wikimedia.org/T110137#1653326 (10Dzahn) 5Open>3Resolved [15:00:22] 6operations, 10Wikimedia-Mailing-lists, 5Patch-For-Review: Mailman Upgrade (Jessie & Mailman 2.x) and migration to a VM - https://phabricator.wikimedia.org/T105756#1450913 (10Dzahn) [15:00:23] 6operations, 10Wikimedia-Mailing-lists, 5Patch-For-Review: rsync the diff since mail was held on sodium - https://phabricator.wikimedia.org/T110138#1653328 (10Dzahn) 5Open>3Resolved [15:00:31] 6operations, 10Wikimedia-Mailing-lists, 5Patch-For-Review: rsync exim spool directory - https://phabricator.wikimedia.org/T110440#1653330 (10Dzahn) 5Open>3Resolved [15:00:33] 6operations, 10Wikimedia-Mailing-lists, 5Patch-For-Review: Mailman Upgrade (Jessie & Mailman 2.x) and migration to a VM - https://phabricator.wikimedia.org/T105756#1450915 (10Dzahn) [15:00:51] 6operations, 10Wikimedia-Mailing-lists: rsync exim spool directory - https://phabricator.wikimedia.org/T110440#1577996 (10Dzahn) [15:02:15] 6operations, 10Wikimedia-Mailing-lists: Mailman Upgrade (Jessie & Mailman 2.x) and migration to a VM - https://phabricator.wikimedia.org/T105756#1653335 (10Dzahn) [15:02:35] 6operations, 10Wikimedia-Mailing-lists: rsync the diff since mail was held on sodium - https://phabricator.wikimedia.org/T110138#1653337 (10Dzahn) [15:02:47] 6operations, 10Wikimedia-Mailing-lists: fermium needs to have exim4-daemon-heavy installed, not -light - https://phabricator.wikimedia.org/T112229#1653339 (10Dzahn) [15:07:43] (03PS2) 10Dzahn: mailman: set new settings to improve security [puppet] - 10https://gerrit.wikimedia.org/r/235384 (owner: 10John F. Lewis) [15:08:11] 6operations, 10Wikimedia-Mailing-lists: @txt.att.net bounce notifications being sent to list admins - https://phabricator.wikimedia.org/T112912#1653355 (10Multichill) Hi John, >>! In T112912#1653302, @JohnLewis wrote: > First comment for the sake of everyone: please can people stop commenting things that add... [15:08:20] akosiaris: any chance you'll be able to get to vlan changes today? [15:10:06] (03CR) 10Dzahn: [C: 032] mailman: set new settings to improve security [puppet] - 10https://gerrit.wikimedia.org/r/235384 (owner: 10John F. Lewis) [15:10:16] (03PS1) 10BBlack: fixup lvs1007-12 eth0 netmasks [puppet] - 10https://gerrit.wikimedia.org/r/239387 (https://phabricator.wikimedia.org/T104458) [15:10:30] (03PS2) 10BBlack: fixup lvs1007-12 eth0 netmasks [puppet] - 10https://gerrit.wikimedia.org/r/239387 (https://phabricator.wikimedia.org/T104458) [15:10:43] (03CR) 10BBlack: [C: 032 V: 032] fixup lvs1007-12 eth0 netmasks [puppet] - 10https://gerrit.wikimedia.org/r/239387 (https://phabricator.wikimedia.org/T104458) (owner: 10BBlack) [15:11:15] mutante: you can do mine whenever you're ready for yours [15:11:35] bblack: i just did [15:11:37] both merged [15:15:15] phab down or just me? [15:15:33] 6operations, 10Wikimedia-Mailing-lists: @txt.att.net bounce notifications being sent to list admins - https://phabricator.wikimedia.org/T112912#1653382 (10JohnLewis) >>! In T112912#1653355, @Multichill wrote: > Hi John, > >>>! In T112912#1653302, @JohnLewis wrote: >> First comment for the sake of everyone: pl... [15:16:09] gerrit too… okay, i guess it's just me [15:16:46] RIP lists.wikimedia.org [15:16:54] pinging phabricator.wikimedia.org [208.80.154.241] and gerrit.wikimedia.org [208.80.154.81] times out for me. any ideas why? [15:17:13] They seem to work for me, maybe it's a regional issue [15:17:46] mutante: I don't know if your new settings might have brought down mailman...it's not responding to me [15:18:12] gerrit and phab both work for me [15:18:21] marktraceur: it's down because we are moving it right now [15:18:24] Ah, K. [15:20:57] !log create restbase user on cassandra test cluster [15:21:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:23:47] (03PS1) 10Filippo Giunchedi: cassandra: fix adduser.cql statements [puppet] - 10https://gerrit.wikimedia.org/r/239389 [15:24:16] (03PS1) 10Giuseppe Lavagetto: Use argparse instead of getopt [debs/pybal] - 10https://gerrit.wikimedia.org/r/239390 [15:24:18] (03PS1) 10Giuseppe Lavagetto: Remove daemonization options [debs/pybal] - 10https://gerrit.wikimedia.org/r/239391 [15:24:20] (03PS1) 10Giuseppe Lavagetto: Remove LogFile [debs/pybal] - 10https://gerrit.wikimedia.org/r/239392 [15:24:22] (03PS1) 10Giuseppe Lavagetto: Add systemd support, remove sysvinit support [debs/pybal] - 10https://gerrit.wikimedia.org/r/239393 [15:26:28] 6operations, 10Wikimedia-Mailing-lists: Mailman Upgrade (Jessie & Mailman 2.x) and migration to a VM - https://phabricator.wikimedia.org/T105756#1653423 (10Dzahn) [15:26:30] 6operations, 10Wikimedia-Mailing-lists, 5Patch-For-Review: run /var/lib/mailman/bin/update and ./check_perms - https://phabricator.wikimedia.org/T113020#1653422 (10Dzahn) 5Open>3Resolved [15:26:51] 6operations, 10Wikimedia-Mailing-lists: test sending individual mails from fermium during migration - https://phabricator.wikimedia.org/T110441#1653424 (10Dzahn) 5Open>3Resolved sent a mail to postmaster@ and it arrived [15:26:52] 6operations, 10Wikimedia-Mailing-lists: Mailman Upgrade (Jessie & Mailman 2.x) and migration to a VM - https://phabricator.wikimedia.org/T105756#1465090 (10Dzahn) [15:29:31] 6operations, 10Wikimedia-Mailing-lists: start exim on fermium / revert migration hack - https://phabricator.wikimedia.org/T113045#1653428 (10Dzahn) 3NEW a:3Dzahn [15:30:37] 6operations, 10Wikimedia-Mailing-lists: Mailman Upgrade (Jessie & Mailman 2.x) and migration to a VM - https://phabricator.wikimedia.org/T105756#1653439 (10Dzahn) [15:33:04] 6operations, 10Wikimedia-Mailing-lists: @txt.att.net bounce notifications being sent to list admins - https://phabricator.wikimedia.org/T112912#1653443 (10Revi) >>! In T112912#1653382, @JohnLewis wrote: > >>>This is extremely low priority right now considering the mailman migration and that follow up (securit... [15:33:38] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] cassandra: fix adduser.cql statements [puppet] - 10https://gerrit.wikimedia.org/r/239389 (owner: 10Filippo Giunchedi) [15:34:03] (03PS2) 10Dzahn: switch lists IP from sodium to fermium [dns] - 10https://gerrit.wikimedia.org/r/233642 (https://phabricator.wikimedia.org/T110139) [15:36:05] (03PS3) 10Dzahn: switch lists IP from sodium to fermium [dns] - 10https://gerrit.wikimedia.org/r/233642 (https://phabricator.wikimedia.org/T110139) [15:36:12] (03CR) 10Giuseppe Lavagetto: [C: 032] "Merging as this change is tests-only" [debs/pybal] - 10https://gerrit.wikimedia.org/r/239337 (owner: 10Giuseppe Lavagetto) [15:36:58] (03CR) 10John F. Lewis: [C: 031] "we're on the road to throwing sodium in water (metaphorically, servers and waters don't mix)" [dns] - 10https://gerrit.wikimedia.org/r/233642 (https://phabricator.wikimedia.org/T110139) (owner: 10Dzahn) [15:37:01] (03CR) 10Dzahn: [C: 032] switch lists IP from sodium to fermium [dns] - 10https://gerrit.wikimedia.org/r/233642 (https://phabricator.wikimedia.org/T110139) (owner: 10Dzahn) [15:37:15] (03PS1) 10Filippo Giunchedi: restbase: add cassandra password for test cluster [puppet] - 10https://gerrit.wikimedia.org/r/239398 (https://phabricator.wikimedia.org/T92590) [15:37:36] 6operations, 10ops-codfw, 10netops: cr1-eqdfw PEM 0 failure - https://phabricator.wikimedia.org/T110435#1653450 (10Papaul) 5Open>3Resolved a:3Papaul PEM 0 replacement compete. [15:39:31] (03Merged) 10jenkins-bot: Add test for ParseCommandLine [debs/pybal] - 10https://gerrit.wikimedia.org/r/239337 (owner: 10Giuseppe Lavagetto) [15:41:39] (03CR) 10John F. Lewis: [C: 04-1] "exim?" [puppet] - 10https://gerrit.wikimedia.org/r/238650 (https://phabricator.wikimedia.org/T110256) (owner: 10Dzahn) [15:42:03] (03CR) 10John F. Lewis: [C: 031] add IPv6 for ytterbium (gerrit) [puppet] - 10https://gerrit.wikimedia.org/r/214437 (https://phabricator.wikimedia.org/T37540) (owner: 10Dzahn) [15:42:33] (03CR) 10Dzahn: "that's here https://gerrit.wikimedia.org/r/#/c/238652/" [puppet] - 10https://gerrit.wikimedia.org/r/238650 (https://phabricator.wikimedia.org/T110256) (owner: 10Dzahn) [15:43:23] 6operations, 6Analytics-Kanban, 10hardware-requests, 5Patch-For-Review: Request three servers for Pageview API - https://phabricator.wikimedia.org/T111053#1653490 (10kevinator) a:3Ottomata [15:44:01] (03CR) 10John F. Lewis: mailman: exim alias for discovery list renames (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/238652 (owner: 10Dzahn) [15:44:17] (03CR) 10John F. Lewis: [C: 031] "wasn't added to that review. looks good." [puppet] - 10https://gerrit.wikimedia.org/r/238650 (https://phabricator.wikimedia.org/T110256) (owner: 10Dzahn) [15:45:03] 6operations, 10Wikimedia-Mailing-lists, 5Patch-For-Review: switch over mailman service IP - https://phabricator.wikimedia.org/T110139#1653496 (10Dzahn) 5Open>3Resolved [15:45:04] 6operations, 10Wikimedia-Mailing-lists: Mailman Upgrade (Jessie & Mailman 2.x) and migration to a VM - https://phabricator.wikimedia.org/T105756#1653497 (10Dzahn) [15:46:18] 6operations, 10Wikimedia-Mailing-lists: switch over mailman service IP - https://phabricator.wikimedia.org/T110139#1569669 (10Dzahn) [15:47:39] (03PS2) 10Dzahn: Revert "exim: temp hack to stop exim when on fermium" [puppet] - 10https://gerrit.wikimedia.org/r/234440 [15:50:02] (03PS1) 10John F. Lewis: lists: TTL up to 1H [dns] - 10https://gerrit.wikimedia.org/r/239400 [15:50:05] (03CR) 10Milimetric: "It is supposed to die, but there's no other easy way to set up a dashboard from ad-hoc graphs. And it doesn't look like we'll get that pr" [puppet] - 10https://gerrit.wikimedia.org/r/231144 (owner: 10Faidon Liambotis) [15:55:05] akosiaris, funny about "the best time" -- actually i think it was the worst time because we announced it 8 hours before that :))) [15:55:17] any issues now? [15:55:59] (03CR) 10Dzahn: [C: 032] Revert "exim: temp hack to stop exim when on fermium" [puppet] - 10https://gerrit.wikimedia.org/r/234440 (owner: 10Dzahn) [16:02:17] (03PS1) 10Glaisher: Remove redundant entries from robots.txt [mediawiki-config] - 10https://gerrit.wikimedia.org/r/239403 (https://phabricator.wikimedia.org/T104251) [16:04:12] (03PS1) 10Jdlrobson: Allow testing of WikidataPageBanners on mobile skin on beta labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/239404 [16:05:36] 6operations, 10Wikimedia-Mailing-lists: @txt.att.net bounce notifications being sent to list admins - https://phabricator.wikimedia.org/T112912#1653545 (10Risker) I'm going to agree with John here that bounces are important. They are flags that there is a problem with the process - the level of importance of... [16:07:34] !log deactivating ΒGP with GTT @ eqiad [16:07:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:13:23] (03CR) 10Filippo Giunchedi: [C: 04-1] "+1 on the switch to service unit itself" (033 comments) [debs/pybal] - 10https://gerrit.wikimedia.org/r/239393 (owner: 10Giuseppe Lavagetto) [16:14:21] !log elastic in eqiad plugin updates: restarting elastic1021 [16:14:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:18:50] (03CR) 10Milimetric: [C: 031] "Tested this as well, it works for me. We'll wait until Monday to deploy" [puppet] - 10https://gerrit.wikimedia.org/r/237688 (https://phabricator.wikimedia.org/T112265) (owner: 10Ottomata) [16:19:21] 6operations: exim4::dkim creates empty key file - https://phabricator.wikimedia.org/T113051#1653582 (10Dzahn) [16:21:40] 6operations, 10Wikimedia-Mailing-lists: @txt.att.net bounce notifications being sent to list admins - https://phabricator.wikimedia.org/T112912#1653585 (10Selsharbaty-WMF) Thank you for taking care of it, Eliza and John! [16:23:20] (03CR) 10Negative24: "@Yuvipanda, NP is not a valid password so it sets the VCS user to a no-login user. This is according to the specification in https://secur" [puppet] - 10https://gerrit.wikimedia.org/r/226573 (owner: 10Negative24) [16:24:00] (03CR) 10Negative24: "That period isn't a part of the link" [puppet] - 10https://gerrit.wikimedia.org/r/226573 (owner: 10Negative24) [16:30:23] (03CR) 10Yurik: "More ppl commented about the need for this -- https://en.wikivoyage.org/wiki/Wikivoyage:Travellers%27_pub#Announcing_the_launch_of_Maps" [puppet] - 10https://gerrit.wikimedia.org/r/239279 (owner: 10Yurik) [16:32:42] 6operations, 10Wikimedia-Mailing-lists: Mailman Upgrade (Jessie & Mailman 2.x) and migration to a VM - https://phabricator.wikimedia.org/T105756#1653603 (10JohnLewis) Migration is over, things seem working (receiving emails, tailing logs on fermium show a bit of activity), so I feel confident saying we're on J... [16:32:47] mutante: ^^ [16:35:55] congrats JohnFLewis [16:36:38] \o/ [16:36:41] (03PS2) 10BBlack: Added wikivoyage-ev.org to maps referrer check [puppet] - 10https://gerrit.wikimedia.org/r/239279 (owner: 10Yurik) [16:36:43] congrats JohnFLewis, mutante! [16:36:51] yay! [16:36:53] (03CR) 10BBlack: [C: 032 V: 032] Added wikivoyage-ev.org to maps referrer check [puppet] - 10https://gerrit.wikimedia.org/r/239279 (owner: 10Yurik) [16:37:32] :) :) [16:38:38] :) :) [16:38:44] robh: sodium; are we going to decom it or put it in as a spare? (a question for the patches I'm making now for next week or so) [16:38:47] i'm very happy it's done [16:39:28] 6operations, 10Wikimedia-Mailing-lists: start exim on fermium / revert migration hack - https://phabricator.wikimedia.org/T113045#1653620 (10Dzahn) 5Open>3Resolved [16:39:30] 6operations, 10Wikimedia-Mailing-lists: Mailman Upgrade (Jessie & Mailman 2.x) and migration to a VM - https://phabricator.wikimedia.org/T105756#1653621 (10Dzahn) [16:39:51] 6operations, 10Wikimedia-Mailing-lists: exim4::dkim creates empty key file - https://phabricator.wikimedia.org/T113051#1653622 (10Dzahn) [16:40:25] "celebrate no more lucid" is literally part of the ticket https://phabricator.wikimedia.org/T110142 [16:41:04] !log mailman now on 2.1.18 and jessie [16:41:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:41:54] JohnFLewis: likely spares initially [16:42:33] 6operations, 10Wikimedia-Mailing-lists: shutdown sodium, decom - https://phabricator.wikimedia.org/T110142#1653629 (10Dzahn) [16:42:35] 6operations, 10Wikimedia-Mailing-lists: Mailman Upgrade (Jessie & Mailman 2.x) and migration to a VM - https://phabricator.wikimedia.org/T105756#1653628 (10Dzahn) [16:42:38] robh: okay. so leaving sodium's mgmt is the correct way right? [16:42:42] mutante: yay [16:42:53] JohnFLewis: confirmed, leaving mgmt is correct but pulling everyting else [16:42:58] okay [16:43:01] 6operations, 10Wikimedia-Mailing-lists: Mailman Upgrade (Jessie & Mailman 2.x) and migration to a VM - https://phabricator.wikimedia.org/T105756#1511445 (10Dzahn) [16:43:56] greg-g: :) [16:44:55] (03PS1) 10John F. Lewis: remove sodium from puppet (spare/decom) [puppet] - 10https://gerrit.wikimedia.org/r/239411 [16:47:54] (03PS1) 10BBlack: new eqiad LVS: turn on LVS/pybal parts [puppet] - 10https://gerrit.wikimedia.org/r/239413 [16:48:09] (03PS2) 10Faidon Liambotis: Replace Package['git-core'] with Package['git'] [puppet] - 10https://gerrit.wikimedia.org/r/233853 [16:48:16] (03PS7) 10Faidon Liambotis: Remove support for Ubuntu Lucid/10.04 [puppet] - 10https://gerrit.wikimedia.org/r/179888 [16:48:25] rebased on top of JohnFLewis's kill sodium change [16:49:07] (03CR) 10Eevans: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/239398 (https://phabricator.wikimedia.org/T92590) (owner: 10Filippo Giunchedi) [16:49:13] (03PS1) 10John F. Lewis: remove sodium.wm.o (leaving sodium.mgmt.eqiad.wmnet) [dns] - 10https://gerrit.wikimedia.org/r/239414 [16:49:20] JohnFLewis: did we resolve https://phabricator.wikimedia.org/T27231 too? [16:49:38] (03PS2) 10BBlack: new eqiad LVS: turn on LVS/pybal parts [puppet] - 10https://gerrit.wikimedia.org/r/239413 (https://phabricator.wikimedia.org/T104458) [16:49:53] (03CR) 10BBlack: [C: 032 V: 032] new eqiad LVS: turn on LVS/pybal parts [puppet] - 10https://gerrit.wikimedia.org/r/239413 (https://phabricator.wikimedia.org/T104458) (owner: 10BBlack) [16:50:06] (03CR) 10Faidon Liambotis: [C: 04-1] remove sodium from puppet (spare/decom) (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/239411 (owner: 10John F. Lewis) [16:50:21] mutante: yeah fixed it .13 [16:50:33] also "Mitigate strict DMARC policy on the mailing lists" [16:50:38] no... .16 [16:50:41] "This is fixed in Mailman 2.1.18" [16:50:55] well not fixed by itself [16:50:59] you need to tune some options [16:51:02] but they exist since .18 [16:51:41] ok, so just unblocked. ack [16:52:13] (03PS2) 10John F. Lewis: remove sodium from puppet (spare/decom) [puppet] - 10https://gerrit.wikimedia.org/r/239411 (https://phabricator.wikimedia.org/T110142) [16:52:16] checks "Improve SSL of lists.wikimedia.org" next [17:00:05] (03CR) 10Dzahn: [C: 031] Allow testing of WikidataPageBanners on mobile skin on beta labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/239404 (owner: 10Jdlrobson) [17:00:41] (03PS6) 1020after4: A context manager for managing nested loggers [tools/scap] - 10https://gerrit.wikimedia.org/r/239028 [17:00:56] (03CR) 10jenkins-bot: [V: 04-1] A context manager for managing nested loggers [tools/scap] - 10https://gerrit.wikimedia.org/r/239028 (owner: 1020after4) [17:02:42] jouncebot: next [17:02:42] In 69 hour(s) and 57 minute(s): Morning SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150921T1500) [17:03:33] if i'd merge something in mediawiki-config where all the affected files have a "-labs.php" at the end, does it get deployed automatically? [17:03:53] greg-g: ^ maybe? [17:04:22] yes [17:04:46] the affected files don't matter; it's simply that the beta cluster automatically fetches any and all updates to mediawiki-config every N minutes [17:04:46] ok, then i assume it's ok i just do that without deploying anything and right now [17:05:03] yes, but you should still merge on tin [17:05:30] to spare the next deployer from discovering an unmerged patch and from having to reason about whether it affects production or not [17:05:39] 6operations, 10Wikimedia-Mailing-lists, 7user-notice: send follow-up email, announce changes with new mailman version if any that have user impact - https://phabricator.wikimedia.org/T110140#1653740 (10Paladox) The current version is 2.1.20 and a 3.0.0 (Show Don't Tell) http://www.gnu.org/software/mailman/ [17:06:10] 6operations, 10Wikimedia-Mailing-lists, 7user-notice: send follow-up email, announce changes with new mailman version if any that have user impact - https://phabricator.wikimedia.org/T110140#1653741 (10Paladox) 3.0 is the current release. [17:06:27] mutante: it gets deployed to beta cluster automatically, but not prod, which you should just do so it's clean [17:06:34] i see [17:06:46] (and sorry for roping you in mutante i thought this was super trivial :)) [17:07:00] just keen to test some stuff in beta labs before it hits wikivoyage [17:07:05] 6operations, 10Wikimedia-Mailing-lists, 7user-notice: send follow-up email, announce changes with new mailman version if any that have user impact - https://phabricator.wikimedia.org/T110140#1653742 (10Paladox) https://gitlab.com/mailman/mailman/blob/master/src/mailman/docs/NEWS.rst [17:07:55] ori: greg-g: ok, thanks for the info [17:08:03] 6operations, 10Wikimedia-Mailing-lists, 7user-notice: send follow-up email, announce changes with new mailman version if any that have user impact - https://phabricator.wikimedia.org/T110140#1653751 (10JohnLewis) 3.0 is indeed the current release but it is not packaged and not advised to upgrade to from 2.1.... [17:08:15] (03PS1) 10BBlack: define interface::manual for jessie LVS eth1-3 tweaks [puppet] - 10https://gerrit.wikimedia.org/r/239416 (https://phabricator.wikimedia.org/T96375) [17:10:14] (03CR) 10Ori.livneh: [C: 04-1] "Thanks! The 'Pythonic' way is to try / except, though, in the name of EAFP (https://docs.python.org/2/glossary.html)" [puppet] - 10https://gerrit.wikimedia.org/r/239338 (owner: 10Filippo Giunchedi) [17:11:19] 6operations, 10Wikimedia-Mailing-lists, 7user-notice: send follow-up email, announce changes with new mailman version if any that have user impact - https://phabricator.wikimedia.org/T110140#1653756 (10Paladox) So mailman on Wikimedia was running a really old version. [17:12:00] 6operations, 10Wikimedia-Mailing-lists, 7user-notice: send follow-up email, announce changes with new mailman version if any that have user impact - https://phabricator.wikimedia.org/T110140#1653763 (10JohnLewis) It was running 2.1.13 which was released in 2008, so yes indeed. [17:12:01] (03CR) 10BBlack: [C: 032] define interface::manual for jessie LVS eth1-3 tweaks [puppet] - 10https://gerrit.wikimedia.org/r/239416 (https://phabricator.wikimedia.org/T96375) (owner: 10BBlack) [17:12:10] (03PS2) 10Ori.livneh: utils/pcc: jenkinsapi forward/backward compatibility [puppet] - 10https://gerrit.wikimedia.org/r/239338 (owner: 10Filippo Giunchedi) [17:12:29] mutante: if we could have left sodium for 3 more years, we could say we had been running mailman 10 years out of date ;) [17:13:03] (03PS3) 10Ori.livneh: utils/pcc: jenkinsapi forward/backward compatibility [puppet] - 10https://gerrit.wikimedia.org/r/239338 (owner: 10Filippo Giunchedi) [17:13:29] 6operations, 10Wikimedia-Mailing-lists, 7user-notice: send follow-up email, announce changes with new mailman version if any that have user impact - https://phabricator.wikimedia.org/T110140#1653777 (10Paladox) Oh ok. Thanks for replying. [17:13:39] (03CR) 10Ori.livneh: [C: 032 V: 032] utils/pcc: jenkinsapi forward/backward compatibility [puppet] - 10https://gerrit.wikimedia.org/r/239338 (owner: 10Filippo Giunchedi) [17:13:55] 6operations, 10Wikimedia-Mailing-lists, 7user-notice: send follow-up email, announce changes with new mailman version if any that have user impact - https://phabricator.wikimedia.org/T110140#1653778 (10Paladox) What would be the improvements from the prevous server it was on. Since it was moved to Jessie. [17:15:35] 6operations, 10Wikimedia-Mailing-lists, 7user-notice: send follow-up email, announce changes with new mailman version if any that have user impact - https://phabricator.wikimedia.org/T110140#1653784 (10Paladox) But could this http://lists.wikimedia.org.uk/mailman/ be redirected to http://lists.wikimedia.org... [17:16:11] 6operations, 10Wikimedia-Mailing-lists, 7user-notice: send follow-up email, announce changes with new mailman version if any that have user impact - https://phabricator.wikimedia.org/T110140#1653785 (10Dzahn) other improvements just by moving to jessie, include the SSL setup. SSL ciphers -> https://phabrica... [17:16:39] 6operations, 10Wikimedia-Mailing-lists: Mailman Upgrade (Jessie & Mailman 2.x) and migration to a VM - https://phabricator.wikimedia.org/T105756#1653788 (10Dzahn) [17:16:41] 6operations, 10Wikimedia-Mailing-lists, 7user-notice: send follow-up email, announce changes with new mailman version if any that have user impact - https://phabricator.wikimedia.org/T110140#1653786 (10Dzahn) 5Open>3Resolved mailed listadmins@ wikitech@, ops@ and also wikimedia-l because somebody had an... [17:17:45] 6operations, 10hardware-requests: eqiad: (1) hardware request for ElasticSearch replication to Labs - 4 weeks use - https://phabricator.wikimedia.org/T112163#1653796 (10mark) labs-support VLAN (unlike labs instances etc) are not in row B I think... [17:18:13] mutante: jdlrobson you? https://phabricator.wikimedia.org/T113060#1653766 [17:18:59] 6operations, 10hardware-requests: eqiad: (1) hardware request for ElasticSearch replication to Labs - 4 weeks use - https://phabricator.wikimedia.org/T112163#1653803 (10mark) I approve a test with one spare server for a maximum of 6 weeks, after which it will need to be given back in any circumstance. If we w... [17:19:44] 6operations, 10hardware-requests: eqiad: (1) hardware request for ElasticSearch replication to Labs - 4 weeks use - https://phabricator.wikimedia.org/T112163#1653805 (10RobH) a:5mark>3RobH [17:19:58] greg-g: no, i don't know. i didnt actually merge stuff [17:20:01] ebernhardson: ^ yay [17:20:47] greg-g: what does that bug mean? [17:21:18] 6operations, 10Wikimedia-Mailing-lists, 7user-notice: send follow-up email, announce changes with new mailman version if any that have user impact - https://phabricator.wikimedia.org/T110140#1653816 (10Dzahn) >>! In T110140#1653784, @Paladox wrote: > But could this http://lists.wikimedia.org.uk/mailman/ be r... [17:21:28] jdlrobson: just go to beta cluster: http://en.wikipedia.beta.wmflabs.org/wiki/Main_Page [17:21:33] or http://deployment.wikimedia.beta.wmflabs.org/wiki/Special:AbuseLog [17:21:43] ah i see. Mobile is unaffected. Weird [17:21:49] greg-g: probably the less.php library change [17:21:51] let me see [17:21:53] huge uptick in fatals: https://logstash-beta.wmflabs.org/#/dashboard/elasticsearch/fatalmonitor [17:22:05] ori: yep, it's less [17:22:34] ok, fixing [17:22:38] yuvipanda: sweet [17:22:45] Out of interest why is mobile fine given mobile uses less like crazy...? [17:22:57] greg-g: it's because the vendor patch is still merging [17:23:32] No it's not [17:23:34] 17:16:48 Fatal error: Call to undefined method Less_Parser::compileFile() in /mnt/jenkins-workspace/workspace/mediawiki-extensions-hhvm/src/tests/phpunit/LessFileCompilationTest.php on line 48 [17:23:42] It'll fail in a few minutes [17:23:56] ebernhardson: I'll get the hardware network stuff sorted out next week :) [17:24:02] yuvipanda: i suppose the question then is, what do we need to get out of the 6 weeks? we should probably make a list of goals [17:24:07] egh [17:24:09] ebernhardson: yup. [17:24:10] SSL config for lists now also rated grade A+ by ssllabs check [17:24:58] ForwardSecrecy, yes , with most browsers [17:26:31] 6operations, 10ops-eqiad, 10Traffic, 10netops, 5Patch-For-Review: rack/setup new eqiad lvs machines - https://phabricator.wikimedia.org/T104458#1653843 (10BBlack) Config should all be set correctly now, except for the missing DHCP macaddr stuff for booting 1007-11 (only 1012 defined). On lvs1012 itself:... [17:26:46] greg-g: since when are the fatals? [17:27:00] 6operations, 7HTTPS: Add Forward Secrecy to all HTTPS sites - https://phabricator.wikimedia.org/T55259#1653848 (10Dzahn) [17:27:10] oh, on beta? [17:27:23] broken by I826adf9. Should be fixed by I43131887. [17:27:44] aude: yeah :) [17:27:54] RECOVERY - Disk space on labstore1002 is OK: DISK OK [17:28:41] 6operations, 10Wikimedia-Mailing-lists, 7Mail: Enable STARTTLS (both inbound and outbound) on lists - https://phabricator.wikimedia.org/T82576#1653864 (10Dzahn) This is now unblocked since mailman has been migrated. [17:29:10] greg-g: ok [17:29:46] greg-g: ps. we scheduled data access for wikibooks for next tuesday (https://phabricator.wikimedia.org/T107999) [17:29:50] 6operations: Get rid of all Ubuntu Lucid (10.04) installs - https://phabricator.wikimedia.org/T80945#1653869 (10Dzahn) This is now unblocked because we switched mailman over to fermium. Just want to keep sodium up for a couple more days, just in case. [17:29:52] * aude put it on the deployment calendar [17:30:45] 6operations: shutdown sodium after mailman has migrated to jessie VM - https://phabricator.wikimedia.org/T82698#1653872 (10Dzahn) mailman has now been migrated to fermium, so sodium is not actively used anymore. we will just keep it around for a few more days just in case there are any issues that we notice later [17:31:07] greg-g: how often does beta update its vendor checkout? [17:32:12] ori: not sure, honestly, you don't think it's every ~10 minutes with each scap? [17:32:55] probably [17:33:11] 6operations, 10ops-codfw: wipe working spare disk in codfw - https://phabricator.wikimedia.org/T112783#1653880 (10RobH) I've chatted with Mark about this in IRC. We are good to do this, as long as its an older disk we won't otherwise use. Assigning to @Papaul. Please find a SATA disk, spare, small capacity,... [17:33:23] 6operations, 10ops-codfw: wipe working spare disk in codfw - https://phabricator.wikimedia.org/T112783#1653883 (10RobH) a:5mark>3Papaul [17:36:17] 6operations, 10OTRS: move OTRS to a VM? - https://phabricator.wikimedia.org/T105554#1653927 (10Krenair) [17:36:20] 6operations, 10OTRS: upgrade iodine to jessie or find a new host with jessie for OTRS - https://phabricator.wikimedia.org/T105125#1653926 (10Krenair) [17:37:21] greg-g: better now [17:39:21] yay [17:41:41] 6operations, 10OTRS: upgrade iodine to jessie or find a new host with jessie for OTRS - https://phabricator.wikimedia.org/T105125#1653951 (10Krenair) Does this block {T74109}? [17:43:25] RECOVERY - Router interfaces on cr1-eqord is OK: OK: host 208.80.154.198, interfaces up: 36, down: 0, dormant: 0, excluded: 0, unused: 0 [17:44:55] 6operations, 10OTRS, 6Security, 7HTTPS: SSL-config of the OTRS is outdated - https://phabricator.wikimedia.org/T91504#1653983 (10Krenair) This depends on {T105125} Which should be done as part of {T105554} Which depends on {T111532} (actually, it looks to me like this was already done, will comment there i... [17:46:15] 6operations, 10OTRS, 10vm-requests, 5Patch-For-Review: EQIAD: 1 VM request for OTRS - https://phabricator.wikimedia.org/T111532#1606606 (10Krenair) What's left to do here? Migrating ticket.wikimedia.org from iodine to the new VM is T105554. [17:46:18] 6operations, 10OTRS: move OTRS to a VM - https://phabricator.wikimedia.org/T105554#1654001 (10Krenair) [17:46:58] 6operations, 10OTRS: move OTRS to a VM - https://phabricator.wikimedia.org/T105554#1654012 (10Krenair) A VM was created for it, so I'm assuming yes. [17:47:24] 6operations, 10OTRS, 10vm-requests: EQIAD: 1 VM request for OTRS - https://phabricator.wikimedia.org/T111532#1654013 (10Krenair) [17:49:05] (03CR) 10EBernhardson: [C: 032] TTMServer: enable wikimedia extra plugin [mediawiki-config] - 10https://gerrit.wikimedia.org/r/238446 (owner: 10DCausse) [17:49:07] 10Ops-Access-Requests, 6operations: Requesting access to stat1002 (Hue / Hive) for bmansurov - https://phabricator.wikimedia.org/T113069#1654028 (10bmansurov) 3NEW [17:52:12] (03Merged) 10jenkins-bot: TTMServer: enable wikimedia extra plugin [mediawiki-config] - 10https://gerrit.wikimedia.org/r/238446 (owner: 10DCausse) [17:52:47] 17:45 < ebernhard> i need to deploy a config change to the cluster, https://gerrit.wikimedia.org/r/#/c/238446/ . We are doing a rolling restart of the elasticsearch cluster to disable an insecure feature, and this config tells mediawiki to stop using the insecure feature [17:52:58] ebernhardson: for prod things, here please so others know :) [17:53:45] 6operations, 10ops-eqiad: mw1031 has a bad uplink - https://phabricator.wikimedia.org/T95896#1654040 (10faidon) I saw you were working on it a few days ago — any news? [17:54:35] !log ebernhardson@tin Synchronized wmf-config/CommonSettings.php: Replace insecure es usage with usage of a plugin (duration: 00m 12s) [17:54:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:55:19] chasemp: looks like not (need help) ^ [17:55:28] :) [17:55:47] 10Ops-Access-Requests, 6operations: Requesting access to stat1002 (Hue / Hive) for bmansurov - https://phabricator.wikimedia.org/T113069#1654042 (10bmansurov) [18:03:55] 6operations, 10ops-codfw: wipe working spare disk in codfw - https://phabricator.wikimedia.org/T112783#1654070 (10Papaul) Ok wil start on this next week. [18:05:25] (03CR) 10Yuvipanda: [C: 04-1] "Doesn't seem to be tracking what job is actually being started?" [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/239377 (https://phabricator.wikimedia.org/T109362) (owner: 10coren) [18:06:46] yuvipanda: Oh, duh. Your comment forced me to reread myself and I just realized that was a singleton! [18:07:04] yuvipanda: Obviously that needs to go into the manifest. :-) [18:07:30] :) [18:08:11] * Coren fixes [18:17:58] 6operations, 10ops-eqiad: mw1031 has a bad uplink - https://phabricator.wikimedia.org/T95896#1654126 (10Cmjohnson) Nothing new...i don't really know what's wrong. The NIC Is onboard so I cannot swap the NIC. I did swap cable. Can try moving ports on the switch. [18:24:46] (03PS4) 10coren: webservicemonitor: some improvements [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/239377 (https://phabricator.wikimedia.org/T109362) [18:24:54] yuvipanda: Not stupid version ^^ [18:25:20] looking now [18:25:25] (03CR) 10jenkins-bot: [V: 04-1] webservicemonitor: some improvements [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/239377 (https://phabricator.wikimedia.org/T109362) (owner: 10coren) [18:26:20] (03PS5) 10coren: webservicemonitor: some improvements [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/239377 (https://phabricator.wikimedia.org/T109362) [18:27:22] ^^ same, sans typo [18:28:48] looking [18:32:46] (03CR) 10Legoktm: "At that point the extension hasn't been loaded yet. It will only be loaded in Setup.php." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/239158 (https://phabricator.wikimedia.org/T112204) (owner: 10Jdlrobson) [18:34:57] (03CR) 10Andrew Bogott: [C: 031] Move *.labsdb aliases into DNS [puppet] - 10https://gerrit.wikimedia.org/r/238672 (https://phabricator.wikimedia.org/T63897) (owner: 10Alex Monk) [18:35:40] 6operations, 10ops-eqiad: Decommission mw1031 - https://phabricator.wikimedia.org/T113079#1654203 (10Cmjohnson) 3NEW [18:36:08] 6operations, 10Wikimedia-Mailing-lists: @txt.att.net bounce notifications being sent to list admins - https://phabricator.wikimedia.org/T112912#1654211 (10Selsharbaty-WMF) > Should Mailman send you, the list owner, any bounce messages that failed to be detected by the bounce processor? Yes is recommended. @Jo... [18:39:39] (03CR) 10Yuvipanda: [C: 04-1] "Can be much clearer IMO" (031 comment) [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/239377 (https://phabricator.wikimedia.org/T109362) (owner: 10coren) [18:39:43] Coren: ^ [18:39:46] andrewbogott: ok going to merge that now [18:39:55] andrewbogott: which machine is dnsrecursor on? [18:40:04] holmium [18:40:08] ok [18:40:11] (03PS6) 10Yuvipanda: labs: Move *.labsdb aliases into DNS [puppet] - 10https://gerrit.wikimedia.org/r/238672 (https://phabricator.wikimedia.org/T63897) (owner: 10Alex Monk) [18:40:43] (03PS7) 10Yuvipanda: labs: Move *.labsdb aliases into DNS [puppet] - 10https://gerrit.wikimedia.org/r/238672 (https://phabricator.wikimedia.org/T63897) (owner: 10Alex Monk) [18:41:04] yuvipanda: Picky, picky, picky. [18:41:06] (03CR) 10Yuvipanda: [C: 032 V: 032] "woot" [puppet] - 10https://gerrit.wikimedia.org/r/238672 (https://phabricator.wikimedia.org/T63897) (owner: 10Alex Monk) [18:42:59] 6operations: Decommission es1001-es1010 - https://phabricator.wikimedia.org/T113080#1654234 (10jcrespo) 3NEW a:3jcrespo [18:45:17] yuvipanda: why datetime objects? Isn't that silly overkill when all we want is to count seconds? [18:45:54] Coren: why strings? why not character arrays? :) [18:46:25] Krenair: the patch seems to work yay :) [18:46:34] Krenair: we can get rid of the /etc/hosts now [18:46:45] yuvipanda: that's actually a reasonable question if all your strings would have only one character ever and the only thing you did to them was compare them to another single character. :-) [18:47:14] Coren: it isn't because there is a level of abstraction that allows us to reason about things easier and we should use them [18:47:32] time.time() is ints, and we are dealing with times and have more appropriate objects available [18:47:51] and date math is much easier with datetime objects than ints [18:48:32] yuvipanda: Not sure I agree with that reasoning, but the overkill in that script is pretty irrelevant. So, "meh." Changing to datetimes. :-) [18:49:11] Coren: with what, datetime objects are easier to do math in than ints? [18:49:13] 6operations, 10hardware-requests, 7Database: new external storage cluster(s) - https://phabricator.wikimedia.org/T105843#1654271 (10jcrespo) Equiad old servers no longer in use, created T113080. Not closing because codfw is still pending. [18:50:15] PROBLEM - Restbase endpoints health on xenon is CRITICAL: /page/mobile-html/{title} is CRITICAL: Test Get MobileApps Main Page returned the unexpected status 404 (expecting: 200) [18:50:38] yuvipanda: That too (hard to get simpler than "x-y > z"), but I was referring to the general principle of adding a level of abstraction being beneficial when it's not actually required. [18:50:46] PROBLEM - Restbase endpoints health on praseodymium is CRITICAL: /page/mobile-html/{title} is CRITICAL: Test Get MobileApps Main Page returned the unexpected status 404 (expecting: 200) [18:51:20] I'm on those mobile warnings [18:51:21] yuvipanda: but absolutely no skin off my back. [18:51:28] they are in staging [18:51:41] (03PS7) 10Andrew Bogott: toolschecker: test for labsdb1004 [puppet] - 10https://gerrit.wikimedia.org/r/239183 (https://phabricator.wikimedia.org/T107449) [18:51:43] (03PS1) 10Andrew Bogott: Added a test for toollabs cron. [puppet] - 10https://gerrit.wikimedia.org/r/239438 (https://phabricator.wikimedia.org/T97748) [18:51:56] PROBLEM - Restbase endpoints health on cerium is CRITICAL: /page/mobile-html/{title} is CRITICAL: Test Get MobileApps Main Page returned the unexpected status 404 (expecting: 200) [18:52:05] ACKNOWLEDGEMENT - Restbase endpoints health on cerium is CRITICAL: /page/mobile-html/{title} is CRITICAL: Test Get MobileApps Main Page returned the unexpected status 404 (expecting: 200) gwicke Investigating [18:52:05] ACKNOWLEDGEMENT - Restbase endpoints health on praseodymium is CRITICAL: /page/mobile-html/{title} is CRITICAL: Test Get MobileApps Main Page returned the unexpected status 404 (expecting: 200) gwicke Investigating [18:52:05] ACKNOWLEDGEMENT - Restbase endpoints health on xenon is CRITICAL: /page/mobile-html/{title} is CRITICAL: Test Get MobileApps Main Page returned the unexpected status 404 (expecting: 200) gwicke Investigating [18:53:07] 6operations, 10netops: Set up NTT transit @ eqdfw, eqord - https://phabricator.wikimedia.org/T111274#1654283 (10faidon) 5Open>3Resolved During the past 24 hours, the Equinix patch from NTT's side was done, @RobH put a patch request from our panel to our router which was subsequently completed and I worked... [18:53:24] \o/ [18:53:38] (03CR) 10Andrew Bogott: [C: 032] Added a test for toollabs cron. [puppet] - 10https://gerrit.wikimedia.org/r/239438 (https://phabricator.wikimedia.org/T97748) (owner: 10Andrew Bogott) [18:53:40] robh: yup! [18:57:54] (03PS6) 10coren: webservicemonitor: some improvements [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/239377 (https://phabricator.wikimedia.org/T109362) [18:58:26] (03CR) 10jenkins-bot: [V: 04-1] webservicemonitor: some improvements [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/239377 (https://phabricator.wikimedia.org/T109362) (owner: 10coren) [18:58:46] PROBLEM - Exim SMTP on sodium is CRITICAL: Connection refused [18:59:06] PROBLEM - mailman archives on sodium is CRITICAL: Connection refused [18:59:26] PROBLEM - mailman list info on sodium is CRITICAL: Connection refused [18:59:47] (03PS7) 10coren: webservicemonitor: some improvements [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/239377 (https://phabricator.wikimedia.org/T109362) [18:59:54] haha byebye sodium. [18:59:54] PROBLEM - mailman_ctl on sodium is CRITICAL: PROCS CRITICAL: 0 processes with UID = 38 (list), regex args /mailman/bin/mailmanctl [19:00:05] PROBLEM - HTTPS on sodium is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:IO::Socket::SSL: connect: Connection refused [19:00:14] PROBLEM - mailman_qrunner on sodium is CRITICAL: PROCS CRITICAL: 0 processes with UID = 38 (list), regex args /mailman/bin/qrunner [19:00:17] (03CR) 10jenkins-bot: [V: 04-1] webservicemonitor: some improvements [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/239377 (https://phabricator.wikimedia.org/T109362) (owner: 10coren) [19:01:33] (03PS8) 10coren: webservicemonitor: some improvements [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/239377 (https://phabricator.wikimedia.org/T109362) [19:05:32] (03PS2) 10Krinkle: asset-check: Use mwLoadEvent hook instead of polling modules directly [puppet] - 10https://gerrit.wikimedia.org/r/235956 [19:09:57] RECOVERY - Restbase endpoints health on xenon is OK: All endpoints are healthy [19:10:36] RECOVERY - Restbase endpoints health on praseodymium is OK: All endpoints are healthy [19:11:45] RECOVERY - Restbase endpoints health on cerium is OK: All endpoints are healthy [19:12:31] (03PS1) 10Jcrespo: Deleting all mention of old servers on production config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/239441 [19:13:58] (03CR) 10Jdlrobson: "ahhhh php. Thanks for enlightening me Lego!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/239158 (https://phabricator.wikimedia.org/T112204) (owner: 10Jdlrobson) [19:17:33] (03CR) 10Dduvall: "The separate deployment directory + symlink sounds like a sane approach. What needs to change in this patchset to support that? AFAICT, it" [tools/scap] - 10https://gerrit.wikimedia.org/r/238839 (https://phabricator.wikimedia.org/T109514) (owner: 10Dduvall) [19:19:01] (03PS3) 10Jdlrobson: Replicate browser test config for QuickSurveys [mediawiki-config] - 10https://gerrit.wikimedia.org/r/239158 (https://phabricator.wikimedia.org/T112204) [19:25:16] 6operations, 10Analytics, 10Analytics-Cluster, 10Fundraising Tech Backlog, 10Fundraising-Backlog: Verify kafkatee use for fundraising logs on erbium - https://phabricator.wikimedia.org/T97676#1654371 (10ellery) Is that really necessary? If so, this is an important use case you can use for pitch :). Anyw... [19:35:01] !log canary deploy of restbase daacf4daa on restbase1001; moving forward so that we can re-enable puppet over the weekend. [19:35:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:38:36] PROBLEM - puppet last run on restbase1002 is CRITICAL: CRITICAL: Puppet last ran 22 hours ago [19:40:25] RECOVERY - puppet last run on restbase1002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:45:05] PROBLEM - puppet last run on restbase1005 is CRITICAL: CRITICAL: Puppet last ran 22 hours ago [19:45:07] !log re-enabled puppet on restbase100* [19:45:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:45:44] PROBLEM - puppet last run on restbase1004 is CRITICAL: CRITICAL: Puppet last ran 22 hours ago [19:46:46] RECOVERY - puppet last run on restbase1005 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:47:25] RECOVERY - puppet last run on restbase1004 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:48:26] (03PS1) 10Alex Monk: Pull *.labsdb out of /etc/hosts on tools [puppet] - 10https://gerrit.wikimedia.org/r/239447 (https://phabricator.wikimedia.org/T63897) [19:49:34] PROBLEM - puppet last run on sodium is CRITICAL: CRITICAL: Puppet last ran 6 hours ago [19:54:03] !log finished deploy of restbase daacf4daa [19:54:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:02:41] 6operations, 10RESTBase, 10RESTBase-Cassandra, 5Patch-For-Review: Set up multi-DC replication for Cassandra - https://phabricator.wikimedia.org/T108613#1654486 (10Eevans) [20:11:51] 6operations, 10Wikimedia-Mailing-lists: @txt.att.net bounce notifications being sent to list admins - https://phabricator.wikimedia.org/T112912#1654508 (10Tgr) You can probably just add `2524063603@mms.att.net` to https://lists.wikimedia.org/mailman/admin//?VARHELP=privacy/subscribing/ban_list if you... [20:13:03] (03PS1) 10Andrew Bogott: Openstack: Move horizon to the same OpenStack version as everything else. [puppet] - 10https://gerrit.wikimedia.org/r/239450 [20:13:45] (03Abandoned) 10Mattflaschen: Remove Flow_test and Flow_test_talk overrides [mediawiki-config] - 10https://gerrit.wikimedia.org/r/223453 (https://phabricator.wikimedia.org/T104279) (owner: 10Mattflaschen) [20:14:09] (03CR) 10Andrew Bogott: [C: 032] Openstack: Move horizon to the same OpenStack version as everything else. [puppet] - 10https://gerrit.wikimedia.org/r/239450 (owner: 10Andrew Bogott) [20:15:30] (03PS7) 1020after4: A context manager for managing nested loggers [tools/scap] - 10https://gerrit.wikimedia.org/r/239028 [20:15:50] andrewbogott, is that everything moved over now then? [20:15:54] (03CR) 10jenkins-bot: [V: 04-1] A context manager for managing nested loggers [tools/scap] - 10https://gerrit.wikimedia.org/r/239028 (owner: 1020after4) [20:16:32] Krenair: not horizon, yet [20:16:40] err [20:16:50] (03PS1) 10Andrew Bogott: Openstack: Add config files for Horizon/Kilo [puppet] - 10https://gerrit.wikimedia.org/r/239451 [20:17:05] okay, when this patch is done, will the migration be over? [20:18:18] (03PS2) 10Yuvipanda: tools: Pull *.labsdb out of /etc/hosts on tools [puppet] - 10https://gerrit.wikimedia.org/r/239447 (https://phabricator.wikimedia.org/T63897) (owner: 10Alex Monk) [20:18:24] (03PS3) 10Yuvipanda: tools: Pull *.labsdb out of /etc/hosts on tools [puppet] - 10https://gerrit.wikimedia.org/r/239447 (https://phabricator.wikimedia.org/T63897) (owner: 10Alex Monk) [20:18:37] (03CR) 10Yuvipanda: [C: 032 V: 032] "\o/" [puppet] - 10https://gerrit.wikimedia.org/r/239447 (https://phabricator.wikimedia.org/T63897) (owner: 10Alex Monk) [20:18:37] Krenair: mostly, although I’m sure there will be more followup patches while I tinker with the config [20:18:57] (03PS2) 10Andrew Bogott: Openstack: Add config files for Horizon/Kilo [puppet] - 10https://gerrit.wikimedia.org/r/239451 [20:19:05] PROBLEM - puppet last run on californium is CRITICAL: CRITICAL: puppet fail [20:20:12] (03CR) 10Andrew Bogott: [C: 032] Openstack: Add config files for Horizon/Kilo [puppet] - 10https://gerrit.wikimedia.org/r/239451 (owner: 10Andrew Bogott) [20:22:42] (03PS8) 1020after4: A context manager for managing nested loggers [tools/scap] - 10https://gerrit.wikimedia.org/r/239028 [20:23:27] (03PS1) 10Andrew Bogott: OpenStack: Add policy files for Horizon/Kilo [puppet] - 10https://gerrit.wikimedia.org/r/239452 [20:24:30] (03CR) 10Andrew Bogott: [C: 032] OpenStack: Add policy files for Horizon/Kilo [puppet] - 10https://gerrit.wikimedia.org/r/239452 (owner: 10Andrew Bogott) [20:26:15] RECOVERY - puppet last run on californium is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [20:26:42] 6operations, 10Wikimedia-Mailing-lists: @txt.att.net bounce notifications being sent to list admins - https://phabricator.wikimedia.org/T112912#1654561 (10Dzahn) >>! In T112912#1654211, @Selsharbaty-WMF wrote: > >> Should Mailman send you, the list owner, any bounce messages that failed to be detected by the... [20:30:29] (03CR) 10Thcipriani: "Would be a nice to have to just put the final finishing symlink in-place with a config var like `final_path` (ln -s [deploy_dir]/current " [tools/scap] - 10https://gerrit.wikimedia.org/r/238839 (https://phabricator.wikimedia.org/T109514) (owner: 10Dduvall) [20:31:51] (03PS1) 10Hashar: tox: support passing positional arguments to envs [tools/scap] - 10https://gerrit.wikimedia.org/r/239453 [20:34:07] !log dropped by_ns indexes on restbase title_revisions tables [20:34:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:43:03] (03CR) 1020after4: [C: 032] tox: support passing positional arguments to envs [tools/scap] - 10https://gerrit.wikimedia.org/r/239453 (owner: 10Hashar) [20:43:19] (03Merged) 10jenkins-bot: tox: support passing positional arguments to envs [tools/scap] - 10https://gerrit.wikimedia.org/r/239453 (owner: 10Hashar) [20:47:40] (03PS1) 10Platonides: T113096: Allow 'block' AbuseFilterAction on eswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/239455 [20:50:22] (03CR) 10Kaldari: [C: 032] Allow testing of WikidataPageBanners on mobile skin on beta labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/239404 (owner: 10Jdlrobson) [20:50:30] (03Merged) 10jenkins-bot: Allow testing of WikidataPageBanners on mobile skin on beta labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/239404 (owner: 10Jdlrobson) [20:52:46] (03PS2) 10Alex Monk: Allow 'block' AbuseFilterAction on eswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/239455 (https://phabricator.wikimedia.org/T113096) (owner: 10Platonides) [20:52:49] !log restart es on elastic1025 to disable dynamic scripting [20:52:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:53:12] (03CR) 10Alex Monk: [C: 04-1] "see ticket" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/239455 (https://phabricator.wikimedia.org/T113096) (owner: 10Platonides) [20:56:03] 6operations, 10RESTBase, 10RESTBase-Cassandra, 5Patch-For-Review: Set up multi-DC replication for Cassandra - https://phabricator.wikimedia.org/T108613#1654737 (10Eevans) I did some preliminary testing of replication correctness today. It was my hope to use `cassandra-stress` for this, but [[https://issu... [21:03:52] 6operations, 10Gitblit, 7Monitoring: Improve monitoring of https://git.wikimedia.org/ - https://phabricator.wikimedia.org/T94320#1654758 (10greg) [21:04:03] 6operations, 10Gitblit: git.wikimedia.org is unstable - https://phabricator.wikimedia.org/T83702#1654766 (10greg) [21:07:16] PROBLEM - Unmerged changes on repository mediawiki_config on tin is CRITICAL: There are 2 unmerged changes in mediawiki_config (dir /srv/mediawiki-staging/). [21:35:40] 6operations, 10Gerrit, 5Patch-For-Review: Wikimedia Gerrit doesn't work if OpenSSH version is higher than 7.0 - https://phabricator.wikimedia.org/T112025#1654892 (10Paladox) @JanZerebecki no it still says Unable to negotiate with 208.80.154.81: no matching key exchange method found. Their offer: diffie-hell... [21:37:22] (03CR) 10Paladox: "Dosent fix ssh issue." [gerrit/plugins] - 10https://gerrit.wikimedia.org/r/237918 (https://phabricator.wikimedia.org/T112025) (owner: 10QChris) [21:37:58] 6operations, 10Wikimedia-Mailing-lists: Change Mailman master password - https://phabricator.wikimedia.org/T110949#1654900 (10Dzahn) >>! In T110949#1590926, @RobH wrote: > Also I suggest these lists (of who has them) be kept someplace public. If its simply a public listing, not any kind of software maintained... [21:43:10] (03PS8) 10Andrew Bogott: toolschecker: test for labsdb1004 [puppet] - 10https://gerrit.wikimedia.org/r/239183 (https://phabricator.wikimedia.org/T107449) [21:43:12] (03PS1) 10Andrew Bogott: toolschecker: Add tests for starting/stopping web services [puppet] - 10https://gerrit.wikimedia.org/r/239504 [21:44:09] (03CR) 10jenkins-bot: [V: 04-1] toolschecker: test for labsdb1004 [puppet] - 10https://gerrit.wikimedia.org/r/239183 (https://phabricator.wikimedia.org/T107449) (owner: 10Andrew Bogott) [21:44:11] (03CR) 10jenkins-bot: [V: 04-1] toolschecker: Add tests for starting/stopping web services [puppet] - 10https://gerrit.wikimedia.org/r/239504 (owner: 10Andrew Bogott) [21:46:11] (03PS9) 10Andrew Bogott: toolschecker: test for labsdb1004 [puppet] - 10https://gerrit.wikimedia.org/r/239183 (https://phabricator.wikimedia.org/T107449) [21:46:13] (03PS2) 10Andrew Bogott: toolschecker: Add tests for starting/stopping web services [puppet] - 10https://gerrit.wikimedia.org/r/239504 [21:46:55] (03CR) 10jenkins-bot: [V: 04-1] toolschecker: test for labsdb1004 [puppet] - 10https://gerrit.wikimedia.org/r/239183 (https://phabricator.wikimedia.org/T107449) (owner: 10Andrew Bogott) [21:47:02] (03CR) 10jenkins-bot: [V: 04-1] toolschecker: Add tests for starting/stopping web services [puppet] - 10https://gerrit.wikimedia.org/r/239504 (owner: 10Andrew Bogott) [21:47:24] (03PS10) 10Andrew Bogott: toolschecker: test for labsdb1004 [puppet] - 10https://gerrit.wikimedia.org/r/239183 (https://phabricator.wikimedia.org/T107449) [21:47:26] (03PS3) 10Andrew Bogott: toolschecker: Add tests for starting/stopping web services [puppet] - 10https://gerrit.wikimedia.org/r/239504 [21:49:35] (03CR) 10Alex Monk: Allow 'block' AbuseFilterAction on eswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/239455 (https://phabricator.wikimedia.org/T113096) (owner: 10Platonides) [21:50:54] (03PS1) 10GWicke: Switch RESTBase logging to logstash1002.eqiad [puppet] - 10https://gerrit.wikimedia.org/r/239506 (https://phabricator.wikimedia.org/T112985) [21:57:01] (03PS11) 10Andrew Bogott: toolschecker: test for labsdb1004 [puppet] - 10https://gerrit.wikimedia.org/r/239183 (https://phabricator.wikimedia.org/T107449) [21:57:03] (03PS4) 10Andrew Bogott: toolschecker: Add tests for starting/stopping web services [puppet] - 10https://gerrit.wikimedia.org/r/239504 [21:57:33] (03CR) 10Ori.livneh: [C: 032] "Yeah, seems like a sensible strategy for now." [puppet] - 10https://gerrit.wikimedia.org/r/239506 (https://phabricator.wikimedia.org/T112985) (owner: 10GWicke) [21:58:53] 6operations, 10MediaWiki-Logging: Set up a service IP for logstash - https://phabricator.wikimedia.org/T113104#1654985 (10GWicke) 3NEW [21:59:11] (03PS12) 10Andrew Bogott: toolschecker: test for labsdb1004 [puppet] - 10https://gerrit.wikimedia.org/r/239183 (https://phabricator.wikimedia.org/T107449) [21:59:14] (03PS5) 10Andrew Bogott: toolschecker: Add tests for starting/stopping web services [puppet] - 10https://gerrit.wikimedia.org/r/239504 [22:00:11] 6operations, 10RESTBase, 6Services, 5Patch-For-Review: RESTBase logging broken in both production & staging - https://phabricator.wikimedia.org/T112985#1654998 (10GWicke) 5Open>3Resolved >>! In T112985#1652188, @GWicke wrote: > Thanks, @ori. > > I'll keep this open for now to track some follow-up work... [22:01:26] 6operations, 10Wikimedia-Mailing-lists: Change Mailman master password - https://phabricator.wikimedia.org/T110949#1655011 (10Dzahn) Ok, i used this opportunity to reset the master (site) password and also the list creator password. I updated the file that ops has access to. I gave the new list creator pass... [22:01:42] (03CR) 10QChris: "> Dosent fix ssh issue." [gerrit/plugins] - 10https://gerrit.wikimedia.org/r/237918 (https://phabricator.wikimedia.org/T112025) (owner: 10QChris) [22:02:22] 6operations, 10Wikimedia-Mailing-lists: Change Mailman master password - https://phabricator.wikimedia.org/T110949#1655012 (10Dzahn) @Jalexander So i put on that wiki page that it's "planned" to give the password also to Maggie. Do you wanna share it with her? [22:03:06] 6operations, 10Wikimedia-Mailing-lists: Change Mailman master password - https://phabricator.wikimedia.org/T110949#1655013 (10Dzahn) 5Open>3Resolved [22:04:09] 6operations, 10Wikimedia-Mailing-lists: TTL back up to normal 1H - https://phabricator.wikimedia.org/T110141#1655023 (10Dzahn) [22:04:11] 6operations, 10Wikimedia-Mailing-lists: Mailman Upgrade (Jessie & Mailman 2.x) and migration to a VM - https://phabricator.wikimedia.org/T105756#1655022 (10Dzahn) [22:04:35] 6operations, 10Wikimedia-Mailing-lists: TTL back up to normal 1H - https://phabricator.wikimedia.org/T110141#1569686 (10Dzahn) will do this once sodium is actually down. [22:04:53] 6operations: shutdown sodium after mailman has migrated to jessie VM - https://phabricator.wikimedia.org/T82698#1655026 (10Dzahn) [22:04:55] 6operations, 10Wikimedia-Mailing-lists: TTL back up to normal 1H - https://phabricator.wikimedia.org/T110141#1655025 (10Dzahn) [22:05:03] 6operations, 10Wikimedia-Mailing-lists: TTL back up to normal 1H - https://phabricator.wikimedia.org/T110141#1655027 (10Dzahn) p:5High>3Normal [22:06:05] 6operations, 10Wikimedia-Mailing-lists: Mailman Upgrade (Jessie & Mailman 2.x) and migration to a VM - https://phabricator.wikimedia.org/T105756#1511534 (10Dzahn) [22:06:56] 6operations, 10Wikimedia-Mailing-lists, 7Mail: Enable STARTTLS (both inbound and outbound) on lists - https://phabricator.wikimedia.org/T82576#1655038 (10Dzahn) [22:06:58] 6operations: Get rid of all Ubuntu Lucid (10.04) installs - https://phabricator.wikimedia.org/T80945#1655039 (10Dzahn) [22:07:03] 6operations, 10Wikimedia-Mailing-lists: Mailman Upgrade (Jessie & Mailman 2.x) and migration to a VM - https://phabricator.wikimedia.org/T105756#1655034 (10Dzahn) 5Open>3Resolved all blockers closed. that closes the tracking ticket :) [22:07:43] 6operations: Change Google Webmaster password for noc@ - https://phabricator.wikimedia.org/T110951#1655046 (10Dzahn) a:5Dzahn>3Jalexander @Jalexander let me know by assigning it back to me when done? thanks! [22:09:50] 6operations, 7Mail: Upgrade Exim to >=4.73 - https://phabricator.wikimedia.org/T83541#1655050 (10Dzahn) exim on fermium is 4.84-8 exim on sodium is not used anymore now [22:10:02] 6operations, 7Mail: Upgrade Exim to >=4.73 - https://phabricator.wikimedia.org/T83541#1655051 (10Dzahn) 5Open>3Resolved [22:12:03] 2 unmerged changes on tin [22:12:32] jouncebot: auto blame [22:15:41] 6operations, 10Annual-Report: create git/gerrit repo for annual report 2015 - https://phabricator.wikimedia.org/T112928#1655068 (10Dzahn) [22:17:39] 6operations, 10Annual-Report: create git/gerrit repo for annual report 2015 - https://phabricator.wikimedia.org/T112928#1655073 (10Dzahn) [22:18:13] 10Ops-Access-Requests, 6operations: Requesting access to stat1002 (Hue / Hive) for bmansurov - https://phabricator.wikimedia.org/T113069#1655083 (10Dzahn) p:5Triage>3Normal [22:19:02] 6operations, 10Gerrit, 5Patch-For-Review: Wikimedia Gerrit doesn't work if OpenSSH version is higher than 7.0 - https://phabricator.wikimedia.org/T112025#1655085 (10QChris) It's not yet expected to work without the workaround. There are two changes linked to this task. One got merged. The other is still pen... [22:19:15] why isn't logmsgbot in -releng? [22:23:26] 10Ops-Access-Requests, 6operations: Requesting access to stat1002 (Hue / Hive) for bmansurov - https://phabricator.wikimedia.org/T113069#1655096 (10Dzahn) @bmansurov Hi, could you let your manager add an approval on this ticket? And, could you sign L3? I see you already have access but likely since before we... [22:24:05] (03PS1) 10Ori.livneh: Add logmsgbot to #wikimedia-releng [puppet] - 10https://gerrit.wikimedia.org/r/239510 [22:24:08] greg-g: ^ [22:28:46] ori: why was it removed? [22:28:59] dunno, i didn't know it was ever there [22:29:11] I didn't see anything obvious in https://phabricator.wikimedia.org/diffusion/OPUP/history/production/modules/tcpircbot/ [22:29:20] yeah, it's been there for a long time [22:29:29] are you sure you are not confusing morebots and logmsgbot? [22:29:29] https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [22:29:35] oh, uhhh [22:29:37] hrmm [22:29:48] well, no, I don't think so [22:29:49] i think you are :) [22:30:01] morebots is the one that actually updates the wiki page when you !log [22:30:01] I am a logbot running on tools-exec-1214. [22:30:01] Messages are logged to wikitech.wikimedia.org/wiki/Server_Admin_Log. [22:30:01] To log a message, type !log . [22:30:23] ok, so follow on, why isn't morebots in -releng? [22:30:25] logmsgbot is just a daemon that takes log messages from scaps and echoes them to irc with a !log so that morebots picks them up [22:30:35] we've been !log'ing since the 15th, but it's not showig up there [22:30:37] https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [22:30:38] dunno! but i vaguely recall it was updated recently [22:30:46] hrmm, /me looks for that one [22:30:52] E_TOOMANYBOTS [22:31:03] that's not in POSIX! [22:32:05] greg-g: I4ecad230537488a9b0b52ad07f24ad491604e3ba went out yesterday, but nothing obvious there [22:32:39] 10Ops-Access-Requests, 6operations: Requesting access to stat1002 (Hue / Hive) for bmansurov - https://phabricator.wikimedia.org/T113069#1655108 (10dr0ptp4kt) Approved. [22:32:50] ori: andrew got it (qa-morebot, even) back [22:32:52] ori: thanks! [22:32:56] cool, np [22:34:53] btw, bd's tool is still rocking it and working :) https://tools.wmflabs.org/sal/releng [22:35:58] (03CR) 10Greg Grossmeier: [C: 04-1] "I was confused, it was morebots. Thanks anywho!" [puppet] - 10https://gerrit.wikimedia.org/r/239510 (owner: 10Ori.livneh) [22:36:09] (03Abandoned) 10Ori.livneh: Add logmsgbot to #wikimedia-releng [puppet] - 10https://gerrit.wikimedia.org/r/239510 (owner: 10Ori.livneh) [22:49:55] (03CR) 10Dduvall: "I created a new task for it." [tools/scap] - 10https://gerrit.wikimedia.org/r/238839 (https://phabricator.wikimedia.org/T109514) (owner: 10Dduvall) [22:54:45] (03PS1) 10Rush: WIP: elastic: sane diamond collector [puppet] - 10https://gerrit.wikimedia.org/r/239511 [22:54:52] (03PS2) 10Rush: WIP: elastic: sane diamond collector [puppet] - 10https://gerrit.wikimedia.org/r/239511 [22:55:45] (03CR) 10jenkins-bot: [V: 04-1] WIP: elastic: sane diamond collector [puppet] - 10https://gerrit.wikimedia.org/r/239511 (owner: 10Rush) [22:56:02] (03CR) 10Kevinator: [C: 04-1] "yes, we should still wait. We need to keep some data for legal purposes." [puppet] - 10https://gerrit.wikimedia.org/r/197081 (https://phabricator.wikimedia.org/T83531) (owner: 10ArielGlenn) [22:56:59] (03PS3) 10Rush: WIP: elastic: sane diamond collector [puppet] - 10https://gerrit.wikimedia.org/r/239511 [22:57:07] (03PS4) 10Rush: WIP: elastic: sane diamond collector [puppet] - 10https://gerrit.wikimedia.org/r/239511 [22:57:47] (03CR) 10jenkins-bot: [V: 04-1] WIP: elastic: sane diamond collector [puppet] - 10https://gerrit.wikimedia.org/r/239511 (owner: 10Rush) [23:00:58] (03CR) 10Paladox: [C: 031] Make gerrit offer newer key exchange algorithms for new sshs [puppet] - 10https://gerrit.wikimedia.org/r/237753 (https://phabricator.wikimedia.org/T112025) (owner: 10QChris) [23:01:27] (03PS5) 10Rush: WIP: elastic: sane diamond collector [puppet] - 10https://gerrit.wikimedia.org/r/239511 [23:01:34] (03PS6) 10Rush: WIP: elastic: sane diamond collector [puppet] - 10https://gerrit.wikimedia.org/r/239511 [23:03:36] (03CR) 10Paladox: Make gerrit offer newer key exchange algorithms for new sshs (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/237753 (https://phabricator.wikimedia.org/T112025) (owner: 10QChris) [23:03:42] 6operations, 10ops-eqiad: Decommission mw1031 - https://phabricator.wikimedia.org/T113079#1655172 (10Dzahn) p:5Triage>3Normal [23:05:03] (03CR) 10EBernhardson: "i like the concept over all and hope it leads to more reliable data." [puppet] - 10https://gerrit.wikimedia.org/r/239511 (owner: 10Rush) [23:06:04] (03PS1) 10Dzahn: remove mw1031 from dsh groups and DHCP [puppet] - 10https://gerrit.wikimedia.org/r/239513 (https://phabricator.wikimedia.org/T113079) [23:07:24] (03CR) 10Dzahn: [C: 031] "thanks @qchris for your work on fixing this issue!:)" [puppet] - 10https://gerrit.wikimedia.org/r/237753 (https://phabricator.wikimedia.org/T112025) (owner: 10QChris) [23:12:33] 6operations, 10Wikimedia-Mailing-lists: exim4::dkim creates empty key file - https://phabricator.wikimedia.org/T113051#1655192 (10Dzahn) p:5Triage>3Normal [23:14:28] (03CR) 10Paladox: [C: 031] Ensure gerrit's plugins are kept in sync with plugin repo [puppet] - 10https://gerrit.wikimedia.org/r/238976 (owner: 10QChris)