[00:00:18] (03PS1) 10Ori.livneh: Update wgResourceLoaderSources[metawiki] to not use bits [mediawiki-config] - 10https://gerrit.wikimedia.org/r/208312 [00:00:49] (03CR) 10Ori.livneh: [C: 032] Update wgResourceLoaderSources[metawiki] to not use bits [mediawiki-config] - 10https://gerrit.wikimedia.org/r/208312 (owner: 10Ori.livneh) [00:00:54] (03Merged) 10jenkins-bot: Update wgResourceLoaderSources[metawiki] to not use bits [mediawiki-config] - 10https://gerrit.wikimedia.org/r/208312 (owner: 10Ori.livneh) [00:02:39] !log ori Synchronized wmf-config/CommonSettings.php: (no message) (duration: 00m 16s) [00:02:47] Logged the message, Master [00:04:45] (03PS6) 10Paladox: Rename $wmincClosedWikis to $wgwmincClosedWikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/207909 [00:04:51] (03PS7) 10Paladox: Rename $wmincClosedWikis to $wgwmincClosedWikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/207909 [00:10:43] (03PS1) 10GWicke: Add x-host-basePath config for /api/rest_v1/ entry point [puppet] - 10https://gerrit.wikimedia.org/r/208313 [00:12:56] RECOVERY - puppet last run on ms-fe3001 is OK Puppet is currently enabled, last run 25 seconds ago with 0 failures [00:13:40] (03PS1) 10Dzahn: fix w3c and sourceforge wiki updates [debs/wikistats] - 10https://gerrit.wikimedia.org/r/208314 (https://phabricator.wikimedia.org/T97834) [00:14:13] (03PS6) 10GWicke: Enable group1 wikis in RESTBase [puppet] - 10https://gerrit.wikimedia.org/r/198433 (https://phabricator.wikimedia.org/T93452) [00:14:17] (03CR) 10Dzahn: [C: 032] fix w3c and sourceforge wiki updates [debs/wikistats] - 10https://gerrit.wikimedia.org/r/208314 (https://phabricator.wikimedia.org/T97834) (owner: 10Dzahn) [00:14:27] (03CR) 10Dzahn: [V: 032] fix w3c and sourceforge wiki updates [debs/wikistats] - 10https://gerrit.wikimedia.org/r/208314 (https://phabricator.wikimedia.org/T97834) (owner: 10Dzahn) [00:27:45] (03CR) 10Eevans: [C: 031] Add commons to restbase config [puppet] - 10https://gerrit.wikimedia.org/r/208193 (https://phabricator.wikimedia.org/T97840) (owner: 10GWicke) [00:28:16] (03PS1) 10Dzahn: fix sourceforce wiki updates [debs/wikistats] - 10https://gerrit.wikimedia.org/r/208316 (https://phabricator.wikimedia.org/T97834) [00:28:51] (03PS2) 10Dzahn: fix sourceforce wiki updates [debs/wikistats] - 10https://gerrit.wikimedia.org/r/208316 (https://phabricator.wikimedia.org/T97834) [00:30:21] 6operations, 10Wikimedia-Apache-configuration: HTTP/1.1 compliance of bits.wikimedia.org/.../load.php - https://phabricator.wikimedia.org/T30345#1253908 (10Krinkle) [00:31:25] (03PS3) 10Dzahn: fix sourceforce wiki updates [debs/wikistats] - 10https://gerrit.wikimedia.org/r/208316 (https://phabricator.wikimedia.org/T97834) [00:32:10] (03CR) 10Dzahn: [C: 032 V: 032] fix sourceforce wiki updates [debs/wikistats] - 10https://gerrit.wikimedia.org/r/208316 (https://phabricator.wikimedia.org/T97834) (owner: 10Dzahn) [00:34:24] 6operations, 10Traffic, 7Varnish: Move bits traffic to text/mobile clusters - https://phabricator.wikimedia.org/T95448#1253914 (10Krinkle) [00:34:25] 6operations, 10Traffic, 7Performance: wmgUseBits = false still loads favicon from bits.wikimedia.org - https://phabricator.wikimedia.org/T97750#1253915 (10Krinkle) [00:40:51] (03PS1) 10Ori.livneh: Update updateBranchPointers for static/ [mediawiki-config] - 10https://gerrit.wikimedia.org/r/208317 [00:41:15] (03CR) 10Ori.livneh: [C: 032] Update updateBranchPointers for static/ [mediawiki-config] - 10https://gerrit.wikimedia.org/r/208317 (owner: 10Ori.livneh) [00:41:21] (03Merged) 10jenkins-bot: Update updateBranchPointers for static/ [mediawiki-config] - 10https://gerrit.wikimedia.org/r/208317 (owner: 10Ori.livneh) [00:44:16] 6operations, 10Wikimedia-Apache-configuration: Verify etag optimization for bits.wikimedia.org - https://phabricator.wikimedia.org/T26531#1253925 (10Krinkle) [00:45:03] 6operations, 10Traffic, 7Varnish: Move bits traffic to text/mobile clusters - https://phabricator.wikimedia.org/T95448#1253930 (10ori) [00:45:04] 6operations, 10Traffic, 7Performance: wmgUseBits = false still loads favicon from bits.wikimedia.org - https://phabricator.wikimedia.org/T97750#1253929 (10ori) 5Open>3Resolved [00:45:39] 6operations, 10Wikimedia-Apache-configuration: Verify etag optimization for bits.wikimedia.org - https://phabricator.wikimedia.org/T26531#1253932 (10Krinkle) 5Open>3declined a:3Krinkle Indeed. Afaik our HTTP 304 Not Modified responses, and Last-Modified information for static files is working fine. [00:45:42] (03PS1) 10Dzahn: fix largest_query/largest_html table [debs/wikistats] - 10https://gerrit.wikimedia.org/r/208318 (https://phabricator.wikimedia.org/T97833) [00:46:31] (03CR) 10Dzahn: [C: 032 V: 032] fix largest_query/largest_html table [debs/wikistats] - 10https://gerrit.wikimedia.org/r/208318 (https://phabricator.wikimedia.org/T97833) (owner: 10Dzahn) [00:50:26] RECOVERY - HTTP error ratio anomaly detection on graphite1001 is OK No anomaly detected [01:03:35] 10Ops-Access-Requests, 6operations: Requesting stat1002/1003 access for sniedzielski - https://phabricator.wikimedia.org/T97866#1253951 (10Dzahn) [01:03:36] 10Ops-Access-Requests, 6operations: Requesting stat1002/1003 access for sniedzielski - https://phabricator.wikimedia.org/T97866#1253953 (10Dzahn) [01:03:54] 10Ops-Access-Requests, 6operations: Requesting stat1002/1003 access for sniedzielski - https://phabricator.wikimedia.org/T97866#1253957 (10Dzahn) p:5Triage>3Normal [01:05:59] 10Ops-Access-Requests, 6operations, 5Patch-For-Review: Add andyrussg to udp2log-users group to allow him to verify kafkatee generated fundraising log files on erbium - https://phabricator.wikimedia.org/T97860#1253960 (10Dzahn) p:5Triage>3Normal [01:09:40] 7Puppet, 6operations, 10Beta-Cluster: Trebuchet on deployment-bastion: wrong group owner - https://phabricator.wikimedia.org/T97775#1253962 (10Dzahn) p:5Triage>3High [01:14:54] (03CR) 10Yuvipanda: [C: 031] "Seems to work" [puppet] - 10https://gerrit.wikimedia.org/r/205897 (owner: 10Andrew Bogott) [01:47:14] (03CR) 1020after4: [C: 032] Add group1.dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/208264 (owner: 10Ori.livneh) [01:53:38] hmm, so the central notice translation system seems to be having issues, it's not loading the menu required to publish the translations (it's showing up as basically an empty menu). Anyone know changes that could have done that? [02:15:54] !log catrope Synchronized php-1.26wmf4/extensions/WikiEditor: Fix data gathering bug (duration: 00m 15s) [02:16:06] Logged the message, Master [02:27:20] !log l10nupdate Synchronized php-1.26wmf3/cache/l10n: (no message) (duration: 08m 11s) [02:27:34] Logged the message, Master [02:32:03] !log LocalisationUpdate completed (1.26wmf3) at 2015-05-02 02:31:00+00:00 [02:32:09] Logged the message, Master [02:32:27] !log catrope Synchronized php-1.26wmf3/extensions/WikiEditor: Fix data gathering bug (duration: 00m 25s) [02:32:33] Logged the message, Master [02:34:57] PROBLEM - puppet last run on mw2071 is CRITICAL puppet fail [02:41:47] PROBLEM - High load for whatever reason on labstore1001 is CRITICAL 55.56% of data above the critical threshold [24.0] [02:48:49] !log l10nupdate Synchronized php-1.26wmf4/cache/l10n: (no message) (duration: 07m 01s) [02:49:00] Logged the message, Master [02:53:39] !log LocalisationUpdate completed (1.26wmf4) at 2015-05-02 02:52:36+00:00 [02:53:47] Logged the message, Master [02:54:17] RECOVERY - puppet last run on mw2071 is OK Puppet is currently enabled, last run 44 seconds ago with 0 failures [03:24:03] !log Granted self admin rights on metawiki temporarily to debug a CentralNotice issue. [03:24:14] Logged the message, Master [03:26:37] RECOVERY - High load for whatever reason on labstore1001 is OK Less than 50.00% above the threshold [16.0] [03:36:16] PROBLEM - High load for whatever reason on labstore1001 is CRITICAL 50.00% of data above the critical threshold [24.0] [03:52:16] RECOVERY - High load for whatever reason on labstore1001 is OK Less than 50.00% above the threshold [16.0] [03:53:46] PROBLEM - LVS HTTP IPv4 on ocg.svc.eqiad.wmnet is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 {channel:frontend.error,request:{id:1430538820423-88343},error:{message:Status check failed (redis failure?)}} - 232 bytes in 0.024 second response time [03:55:19] RECOVERY - LVS HTTP IPv4 on ocg.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 453 bytes in 0.101 second response time [04:02:06] PROBLEM - High load for whatever reason on labstore1001 is CRITICAL 62.50% of data above the critical threshold [24.0] [04:08:26] RECOVERY - High load for whatever reason on labstore1001 is OK Less than 50.00% above the threshold [16.0] [04:56:18] PROBLEM - High load for whatever reason on labstore1001 is CRITICAL 62.50% of data above the critical threshold [24.0] [05:19:32] !log LocalisationUpdate ResourceLoader cache refresh completed at Sat May 2 05:18:29 UTC 2015 (duration 18m 28s) [05:19:38] Logged the message, Master [05:34:11] 10Ops-Access-Requests, 6operations: Requesting stat1002/1003 access for sniedzielski - https://phabricator.wikimedia.org/T97866#1254124 (10Sniedzielski) Not a big deal, but I think this should be under user "niedzielski" since that's what I use for Gerrit. [05:34:38] PROBLEM - High load for whatever reason on labstore1001 is CRITICAL 55.56% of data above the critical threshold [24.0] [05:39:27] PROBLEM - High load for whatever reason on labstore1001 is CRITICAL 55.56% of data above the critical threshold [24.0] [06:08:41] !log signed puppet certs manually on virt1000 [06:08:50] Logged the message, Master [06:30:17] PROBLEM - puppet last run on mw2168 is CRITICAL puppet fail [06:30:37] PROBLEM - puppet last run on cp3016 is CRITICAL Puppet has 1 failures [06:31:38] PROBLEM - puppet last run on cp3037 is CRITICAL Puppet has 2 failures [06:34:07] PROBLEM - puppet last run on cp3014 is CRITICAL Puppet has 1 failures [06:35:17] PROBLEM - puppet last run on ms-fe2003 is CRITICAL Puppet has 1 failures [06:36:37] PROBLEM - puppet last run on mw1003 is CRITICAL Puppet has 1 failures [06:36:56] PROBLEM - puppet last run on mw2013 is CRITICAL Puppet has 1 failures [06:37:07] PROBLEM - puppet last run on mw2123 is CRITICAL Puppet has 1 failures [06:37:17] PROBLEM - puppet last run on mw2082 is CRITICAL Puppet has 1 failures [06:45:06] RECOVERY - puppet last run on cp3016 is OK Puppet is currently enabled, last run 47 seconds ago with 0 failures [06:46:16] RECOVERY - puppet last run on cp3037 is OK Puppet is currently enabled, last run 15 seconds ago with 0 failures [06:46:17] RECOVERY - puppet last run on mw1003 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:46:37] RECOVERY - puppet last run on mw2013 is OK Puppet is currently enabled, last run 51 seconds ago with 0 failures [06:46:37] RECOVERY - puppet last run on ms-fe2003 is OK Puppet is currently enabled, last run 4 seconds ago with 0 failures [06:46:48] RECOVERY - puppet last run on mw2123 is OK Puppet is currently enabled, last run 47 seconds ago with 0 failures [06:46:57] RECOVERY - puppet last run on mw2082 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:47:06] RECOVERY - puppet last run on cp3014 is OK Puppet is currently enabled, last run 49 seconds ago with 0 failures [06:49:27] RECOVERY - puppet last run on mw2168 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [07:09:07] PROBLEM - High load for whatever reason on labstore1001 is CRITICAL 62.50% of data above the critical threshold [24.0] [07:39:37] PROBLEM - High load for whatever reason on labstore1001 is CRITICAL 50.00% of data above the critical threshold [24.0] [07:50:47] PROBLEM - High load for whatever reason on labstore1001 is CRITICAL 55.56% of data above the critical threshold [24.0] [08:08:27] PROBLEM - High load for whatever reason on labstore1001 is CRITICAL 50.00% of data above the critical threshold [24.0] [08:11:37] RECOVERY - High load for whatever reason on labstore1001 is OK Less than 50.00% above the threshold [16.0] [08:19:52] 6operations, 10Wikidata, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests, 5Patch-For-Review: Create Wikipedia Konkani - https://phabricator.wikimedia.org/T96468#1254204 (10Visdaviva) @Nemo_bis the Konkani community has already done 91% of the Mediawiki (most important messages) of this 80% were rev... [08:24:36] PROBLEM - High load for whatever reason on labstore1001 is CRITICAL 66.67% of data above the critical threshold [24.0] [08:36:37] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL 14.29% of data above the critical threshold [500.0] [08:49:27] RECOVERY - HTTP 5xx req/min on graphite1001 is OK Less than 1.00% above the threshold [250.0] [08:51:46] PROBLEM - High load for whatever reason on labstore1001 is CRITICAL 55.56% of data above the critical threshold [24.0] [08:54:13] 6operations, 10Traffic, 7Performance: wmgUseBits = false still loads favicon from bits.wikimedia.org - https://phabricator.wikimedia.org/T97750#1254220 (10Nemo_bis) Thanks! [08:58:07] PROBLEM - High load for whatever reason on labstore1001 is CRITICAL 50.00% of data above the critical threshold [24.0] [09:14:07] PROBLEM - High load for whatever reason on labstore1001 is CRITICAL 50.00% of data above the critical threshold [24.0] [09:23:17] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL 15.38% of data above the critical threshold [500.0] [09:34:18] RECOVERY - HTTP 5xx req/min on graphite1001 is OK Less than 1.00% above the threshold [250.0] [09:38:09] 6operations, 10Wikidata, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests, 5Patch-For-Review: Create Wikipedia Konkani - https://phabricator.wikimedia.org/T96468#1254246 (10Glaisher) >>! In T96468#1251881, @Krenair wrote: > So before we create this wiki, we need those translations to be approved and... [09:51:07] RECOVERY - High load for whatever reason on labstore1001 is OK Less than 50.00% above the threshold [16.0] [10:16:17] https://commons.wikimedia.org/wiki/MediaWiki:Common.js/secure_new.js if is am right, this can be switched off? all wimimedia sites using forcehttps? [10:25:15] * Steinsplitter pokes matanya [13:30:49] 6operations, 10MediaWiki-DjVu, 10MediaWiki-General-or-Unknown, 6Multimedia, and 3 others: img_metadata queries for Djvu files regularly saturate s4 slaves - https://phabricator.wikimedia.org/T96360#1254487 (10faidon) >>! In T96360#1253818, @aaron wrote: > It probably just hits < $this->pageCou... [14:49:28] PROBLEM - puppet last run on cp4003 is CRITICAL puppet fail [15:05:36] RECOVERY - puppet last run on cp4003 is OK Puppet is currently enabled, last run 15 seconds ago with 0 failures [15:39:06] 6operations, 10Wikimedia-Mailing-lists: move analytics-internal list to analytics-wmf - https://phabricator.wikimedia.org/T97618#1254595 (10JohnLewis) @kevinator I'm wondering if this is really necessary with the removal of the '-internal' to '-wmf' as it requires a halt to mailman processes during the move.... [15:41:10] godog: re. https://phabricator.wikimedia.org/T97795 - who are going to be list admins? [15:56:37] PROBLEM - nutcracker port on silver is CRITICAL - Socket timeout after 2 seconds [15:58:17] RECOVERY - nutcracker port on silver is OK: TCP OK - 0.000 second response time on port 11212 [16:34:16] (03PS1) 10Alex Monk: Add my script for generating meta:System_administrators#List [puppet] - 10https://gerrit.wikimedia.org/r/208395 [16:35:11] (03CR) 10jenkins-bot: [V: 04-1] Add my script for generating meta:System_administrators#List [puppet] - 10https://gerrit.wikimedia.org/r/208395 (owner: 10Alex Monk) [16:38:29] (03PS2) 10Alex Monk: Add my script for generating meta:System_administrators#List [puppet] - 10https://gerrit.wikimedia.org/r/208395 [16:39:07] (03CR) 10jenkins-bot: [V: 04-1] Add my script for generating meta:System_administrators#List [puppet] - 10https://gerrit.wikimedia.org/r/208395 (owner: 10Alex Monk) [17:10:27] (03PS3) 10Alex Monk: Add my script for generating meta:System_administrators#List [puppet] - 10https://gerrit.wikimedia.org/r/208395 [17:11:06] (03CR) 10jenkins-bot: [V: 04-1] Add my script for generating meta:System_administrators#List [puppet] - 10https://gerrit.wikimedia.org/r/208395 (owner: 10Alex Monk) [17:13:42] (03PS4) 10Alex Monk: Add my script for generating meta:System_administrators#List [puppet] - 10https://gerrit.wikimedia.org/r/208395 [17:14:21] (03CR) 10jenkins-bot: [V: 04-1] Add my script for generating meta:System_administrators#List [puppet] - 10https://gerrit.wikimedia.org/r/208395 (owner: 10Alex Monk) [17:15:34] 17:14:04 modules/admin/files/GenSysadminTable.py:36:80: E501 line too long (81 > 79 characters) [17:15:37] omg, seriously [17:16:52] pep8? [17:17:00] (03PS5) 10Glaisher: Add my script for generating meta:System_administrators#List [puppet] - 10https://gerrit.wikimedia.org/r/208395 (owner: 10Alex Monk) [17:17:44] (03CR) 10jenkins-bot: [V: 04-1] Add my script for generating meta:System_administrators#List [puppet] - 10https://gerrit.wikimedia.org/r/208395 (owner: 10Alex Monk) [17:18:09] hoo, yeah [17:19:56] PROBLEM - puppet last run on cp4011 is CRITICAL puppet fail [17:21:09] (03PS6) 10Alex Monk: Add my script for generating meta:System_administrators#List [puppet] - 10https://gerrit.wikimedia.org/r/208395 [17:22:24] (03PS7) 10Alex Monk: Add my script for generating meta:System_administrators#List [puppet] - 10https://gerrit.wikimedia.org/r/208395 [17:37:27] RECOVERY - puppet last run on cp4011 is OK Puppet is currently enabled, last run 21 seconds ago with 0 failures [17:47:39] (03PS1) 10Hoo man: Add a dedicated Wikibase job runner [puppet] - 10https://gerrit.wikimedia.org/r/208397 [17:56:06] (03CR) 10Aaron Schulz: [C: 04-1] "I can't see this helping since jobchron is separate service." [puppet] - 10https://gerrit.wikimedia.org/r/208397 (owner: 10Hoo man) [18:01:09] krenair@terbium:~$ mwscript showJobs.php enwiki --group [18:01:09] refreshLinks: 13502576 queued; 545 claimed (495 active, 50 abandoned); 0 delayed [18:01:15] AaronSchulz, is that OK..? [18:02:00] curious https://ganglia.wikimedia.org/latest/graph.php?r=month&z=xlarge&c=Jobrunners+eqiad&m=cpu_report&s=by+name&mc=2&g=network_report [18:45:22] (03CR) 10Aaron Schulz: "2015-05-02T18:44:18+0000: Starting job chron loop(s)..." [puppet] - 10https://gerrit.wikimedia.org/r/208397 (owner: 10Hoo man) [18:50:11] (03PS1) 10Alex Monk: Add my dotfiles [puppet] - 10https://gerrit.wikimedia.org/r/208400 [19:02:53] !log resinstalling analytics1004 and analytics1010 as trusty [19:03:06] Logged the message, Master [19:04:47] PROBLEM - Varnishkafka Delivery Errors per minute on cp4010 is CRITICAL 11.11% of data above the critical threshold [20000.0] [19:09:27] RECOVERY - Varnishkafka Delivery Errors per minute on cp4010 is OK Less than 1.00% above the threshold [0.0] [19:09:56] !log upgrade db1068 trusty, xtrabackup clone from db1056 [19:10:04] Logged the message, Master [19:12:50] (03PS2) 10Springle: script used for non-replicated dbstore backups [puppet] - 10https://gerrit.wikimedia.org/r/207680 (https://phabricator.wikimedia.org/T95835) [19:13:48] (03CR) 10Springle: [C: 032] script used for non-replicated dbstore backups [puppet] - 10https://gerrit.wikimedia.org/r/207680 (https://phabricator.wikimedia.org/T95835) (owner: 10Springle) [19:17:20] 7Blocked-on-Operations, 6operations, 6Scrum-of-Scrums, 10incident-20150410-flowdataloss, and 2 others: Better backup coverage for X1 database cluster - https://phabricator.wikimedia.org/T95835#1254769 (10Springle) 5Open>3Resolved a:3Springle [19:20:08] PROBLEM - salt-minion processes on analytics1010 is CRITICAL: Connection refused by host [19:20:08] PROBLEM - dhclient process on analytics1010 is CRITICAL: Connection refused by host [19:20:37] PROBLEM - puppet last run on analytics1010 is CRITICAL: Connection refused by host [19:20:47] PROBLEM - RAID on analytics1010 is CRITICAL: Connection refused by host [19:20:47] PROBLEM - configured eth on analytics1010 is CRITICAL: Connection refused by host [19:20:57] PROBLEM - Disk space on analytics1010 is CRITICAL: Connection refused by host [19:20:57] PROBLEM - DPKG on analytics1010 is CRITICAL: Connection refused by host [19:24:56] RECOVERY - dhclient process on analytics1010 is OK: PROCS OK: 0 processes with command name dhclient [19:24:57] RECOVERY - salt-minion processes on analytics1010 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [19:25:26] RECOVERY - puppet last run on analytics1010 is OK Puppet is currently enabled, last run 32 seconds ago with 0 failures [19:25:36] RECOVERY - RAID on analytics1010 is OK Active: 4, Working: 4, Failed: 0, Spare: 0 [19:25:36] RECOVERY - configured eth on analytics1010 is OK - interfaces up [19:25:50] RECOVERY - Disk space on analytics1010 is OK: DISK OK [19:25:50] RECOVERY - DPKG on analytics1010 is OK: All packages OK [19:31:20] !log xtrabackup clone db2048, db2049, db2050, db2051, db2052, db2053, db2054 from codfw masters [19:31:26] Logged the message, Master [19:45:56] PROBLEM - puppet last run on ms-be1017 is CRITICAL Puppet has 1 failures [20:01:57] RECOVERY - puppet last run on ms-be1017 is OK Puppet is currently enabled, last run 39 seconds ago with 0 failures [20:25:02] !log Updated jobrunners to c95d565e242e6fa3706c088ddab1cc6f716408e1 [20:25:10] Logged the message, Master [20:26:15] (03CR) 10Aaron Schulz: "Fixed with https://gerrit.wikimedia.org/r/#/c/208408/" [puppet] - 10https://gerrit.wikimedia.org/r/208397 (owner: 10Hoo man) [20:59:57] PROBLEM - puppet last run on mw2067 is CRITICAL puppet fail [21:17:27] RECOVERY - puppet last run on mw2067 is OK Puppet is currently enabled, last run 32 seconds ago with 0 failures [22:02:32] (03PS1) 10Yuvipanda: mesos: import module + add simple role [puppet] - 10https://gerrit.wikimedia.org/r/208472 [22:08:48] (03CR) 10jenkins-bot: [V: 04-1] mesos: import module + add simple role [puppet] - 10https://gerrit.wikimedia.org/r/208472 (owner: 10Yuvipanda) [22:09:10] !log ori Synchronized php-1.26wmf3/extensions/Translate/api/ApiQueryMessageGroups.php: I3bc87f3a5: ApiQueryMessageGroups: mark '_canchange' and '_name' as non-API-metadata (duration: 00m 31s) [22:09:18] Logged the message, Master [22:16:21] !log ori Synchronized php-1.26wmf4/extensions/Translate/api/ApiQueryMessageGroups.php: I3bc87f3a5: ApiQueryMessageGroups: mark '_canchange' and '_name' as non-API-metadata (duration: 00m 30s) [22:16:27] Logged the message, Master [22:16:54] !log Deployed change I3bc87f3a5 to fix UBN! bug T97912. Bug was affecting ability to translate messages needed for running upcoming board election. [22:16:59] Logged the message, Master [22:34:23] (03PS2) 10Yuvipanda: mesos: import module + add simple role [puppet] - 10https://gerrit.wikimedia.org/r/208472 [22:36:09] (03PS3) 10Yuvipanda: mesos: import module + add simple role [puppet] - 10https://gerrit.wikimedia.org/r/208472 [22:38:35] (03CR) 10jenkins-bot: [V: 04-1] mesos: import module + add simple role [puppet] - 10https://gerrit.wikimedia.org/r/208472 (owner: 10Yuvipanda) [22:40:02] (03PS4) 10Yuvipanda: mesos: import module + add simple role [puppet] - 10https://gerrit.wikimedia.org/r/208472 [22:40:04] (03PS1) 10Yuvipanda: zookeeper: Support installing on debian [puppet] - 10https://gerrit.wikimedia.org/r/208475 [22:40:43] (03CR) 10jenkins-bot: [V: 04-1] mesos: import module + add simple role [puppet] - 10https://gerrit.wikimedia.org/r/208472 (owner: 10Yuvipanda) [22:41:22] wtf jenkins [22:49:47] PROBLEM - puppet last run on mw1170 is CRITICAL Puppet has 1 failures [22:54:17] (03PS5) 10Yuvipanda: mesos: import module + add simple role [puppet] - 10https://gerrit.wikimedia.org/r/208472 [22:54:57] (03CR) 10jenkins-bot: [V: 04-1] mesos: import module + add simple role [puppet] - 10https://gerrit.wikimedia.org/r/208472 (owner: 10Yuvipanda) [23:00:12] (03PS1) 10Yuvipanda: Make stdlib a submodule [puppet] - 10https://gerrit.wikimedia.org/r/208476 [23:00:51] (03CR) 10jenkins-bot: [V: 04-1] Make stdlib a submodule [puppet] - 10https://gerrit.wikimedia.org/r/208476 (owner: 10Yuvipanda) [23:03:26] (03CR) 10Hoo man: "Thanks for looking into this." [puppet] - 10https://gerrit.wikimedia.org/r/208397 (owner: 10Hoo man) [23:03:59] (03Abandoned) 10Hoo man: Add a dedicated Wikibase job runner [puppet] - 10https://gerrit.wikimedia.org/r/208397 (owner: 10Hoo man) [23:05:53] (03PS6) 10Yuvipanda: mesos: import module + add simple role [puppet] - 10https://gerrit.wikimedia.org/r/208472 [23:07:26] RECOVERY - puppet last run on mw1170 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [23:20:39] 6operations, 10Wikimedia-Logstash: Elasticsearch not starting on Jessie hosts - https://phabricator.wikimedia.org/T97645#1254877 (10bd808) Installing works. I did this on logstash1004 by downloading the deb with cu... [23:23:08] 6operations, 10Wikimedia-Logstash: Elasticsearch not starting on Jessie hosts - https://phabricator.wikimedia.org/T97645#1254891 (10bd808) Before starting elasticsearch on logstash1004 I configured the cluster to ignore the new hosts for shard allocation: ``` curl -XPUT 'localhost:9200/_cluster/settings' -d '... [23:28:16] RECOVERY - ElasticSearch health check for shards on logstash1004 is OK - elasticsearch status production-logstash-eqiad: status: green, number_of_nodes: 4, unassigned_shards: 0, timed_out: False, active_primary_shards: 49, cluster_name: production-logstash-eqiad, relocating_shards: 2, active_shards: 147, initializing_shards: 0, number_of_data_nodes: 4 [23:32:51] 6operations, 10Wikimedia-Logstash: Elasticsearch not starting on Jessie hosts - https://phabricator.wikimedia.org/T97645#1254894 (10bd808) I made a local modification to /etc/elasticsearch/elasticsearch.yml to enable logstash1004 to find the existing cluster: ``` discovery.zen.ping.unicast.hosts: 10.64.32.137... [23:49:59] (03PS1) 10Yuvipanda: Add .gitreview file [puppet/mesos] - 10https://gerrit.wikimedia.org/r/208477 [23:50:01] (03PS1) 10Yuvipanda: [WMF-Patch] Get rid of autoinstall functionality [puppet/mesos] - 10https://gerrit.wikimedia.org/r/208478 [23:55:29] (03PS7) 10Yuvipanda: mesos: import module + add simple role [puppet] - 10https://gerrit.wikimedia.org/r/208472 [23:56:31] (03PS2) 10Yuvipanda: Make stdlib a submodule [puppet] - 10https://gerrit.wikimedia.org/r/208476 [23:57:11] (03CR) 10jenkins-bot: [V: 04-1] Make stdlib a submodule [puppet] - 10https://gerrit.wikimedia.org/r/208476 (owner: 10Yuvipanda) [23:58:59] Interesting... [23:59:16] There appear to be a few people who have deployment access [23:59:27] but can't actually merge anything to the deployment branches in gerrit?