[00:00:05] twentyafterfour: Respected human, time to deploy Phabricator update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150813T0000). Please do the needful. [00:00:13] RECOVERY - Check size of conntrack table on fluorine is OK nf_conntrack is 0 % full [00:00:14] RECOVERY - RAID on fluorine is OK Active: 10, Working: 10, Failed: 0, Spare: 0 [00:00:33] RECOVERY - DPKG on fluorine is OK: All packages OK [00:00:52] Well, Google Translate is powered by Google Translator Toolkit, which is mostly Wikimedia-volunteer powered, so yeah, it's a bit circular. [00:00:54] RECOVERY - puppet last run on fluorine is OK Puppet is currently enabled, last run 1 hour ago with 0 failures [00:01:14] RECOVERY - configured eth on fluorine is OK - interfaces up [00:01:15] RECOVERY - salt-minion processes on fluorine is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [00:01:15] RECOVERY - dhclient process on fluorine is OK: PROCS OK: 0 processes with command name dhclient [00:02:04] PROBLEM - BGP status on cr2-eqiad is CRITICAL host 208.80.154.197, sessions up: 74, down: 3, shutdown: 0BRPeering with AS1273 not established - CWBRPeering with AS8218 not established - NEO-ASNBRPeering with AS62651 not established - BR [00:08:03] PROBLEM - BGP status on cr2-eqiad is CRITICAL host 208.80.154.197, sessions up: 74, down: 3, shutdown: 0BRPeering with AS1273 not established - CWBRPeering with AS8218 not established - NEO-ASNBRPeering with AS62651 not established - BR [00:10:07] (03PS1) 10Ori.livneh: xenon-generate-svgs: make flamegraph.pl invocations nice [puppet] - 10https://gerrit.wikimedia.org/r/231196 [00:10:24] (03CR) 10Ori.livneh: [C: 032 V: 032] xenon-generate-svgs: make flamegraph.pl invocations nice [puppet] - 10https://gerrit.wikimedia.org/r/231196 (owner: 10Ori.livneh) [00:16:37] 6operations, 10Traffic, 10Wikimedia-DNS, 5Patch-For-Review: DNS request for wikimedia.org (let 3rd party send mail as wikimedia.org) - https://phabricator.wikimedia.org/T107940#1533562 (10Dzahn) @CaitVirtue done. I have added that alias. caitlin@benefactors.wm -> cvirtue@wm is active now [00:17:42] 6operations, 10Traffic, 10Wikimedia-DNS: DNS request for wikimedia.org (let 3rd party send mail as wikimedia.org) - https://phabricator.wikimedia.org/T107940#1533563 (10Dzahn) [00:17:44] PROBLEM - BGP status on cr2-eqiad is CRITICAL host 208.80.154.197, sessions up: 74, down: 3, shutdown: 0BRPeering with AS1273 not established - CWBRPeering with AS8218 not established - NEO-ASNBRPeering with AS62651 not established - BR [00:21:17] (03CR) 10Dzahn: [C: 04-1] Add ferm rules for swift proxies (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/223537 (owner: 10Muehlenhoff) [00:21:33] PROBLEM - BGP status on cr2-eqiad is CRITICAL host 208.80.154.197, sessions up: 74, down: 3, shutdown: 0BRPeering with AS1273 not established - CWBRPeering with AS8218 not established - NEO-ASNBRPeering with AS62651 not established - BR [00:29:01] (03PS1) 10Ori.livneh: Set maximum execution time to 60 seconds [puppet] - 10https://gerrit.wikimedia.org/r/231197 (https://phabricator.wikimedia.org/T97204) [00:30:31] (03CR) 10Ori.livneh: [C: 04-1] "I11c8b0d9 and I893ddd77 have to roll out, first." [puppet] - 10https://gerrit.wikimedia.org/r/231197 (https://phabricator.wikimedia.org/T97204) (owner: 10Ori.livneh) [00:33:43] greg-g: jdlrobson and I would like to retry the WikidataPageBanner deployment again tomorrow morning at 9 if possible. I just didn't have time to finish it this morning because of meetings. Tomorrow I have lots more time available though. [00:35:12] (03CR) 10Aaron Schulz: [C: 031] Set maximum execution time to 60 seconds [puppet] - 10https://gerrit.wikimedia.org/r/231197 (https://phabricator.wikimedia.org/T97204) (owner: 10Ori.livneh) [00:42:27] (03PS1) 10Ori.livneh: Ensure all Xenon records begin with the script base name [mediawiki-config] - 10https://gerrit.wikimedia.org/r/231199 [00:42:32] ^ AaronSchulz [00:45:07] (03CR) 10GWicke: [C: 031] "LGTM once the other patches are deployed." [puppet] - 10https://gerrit.wikimedia.org/r/231197 (https://phabricator.wikimedia.org/T97204) (owner: 10Ori.livneh) [00:46:23] (03CR) 10GWicke: "Oh, and thanks to Ori, too!" [puppet] - 10https://gerrit.wikimedia.org/r/231197 (https://phabricator.wikimedia.org/T97204) (owner: 10Ori.livneh) [00:46:53] (03PS2) 10Ori.livneh: Set maximum execution time to 60 seconds [puppet] - 10https://gerrit.wikimedia.org/r/231197 (https://phabricator.wikimedia.org/T97204) [01:05:32] AaronSchulz, RBR Soon(TM) [01:07:46] !log Depooled mw1041 so it can be set aside for LQT->Flow conversion script (T108601) [01:07:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [01:08:18] ...ish [01:15:57] (03CR) 10Ori.livneh: [C: 032] Ensure all Xenon records begin with the script base name [mediawiki-config] - 10https://gerrit.wikimedia.org/r/231199 (owner: 10Ori.livneh) [01:16:03] (03Merged) 10jenkins-bot: Ensure all Xenon records begin with the script base name [mediawiki-config] - 10https://gerrit.wikimedia.org/r/231199 (owner: 10Ori.livneh) [01:16:48] !log ori@tin Synchronized wmf-config/StartProfiler.php: I482b120289: Ensure all Xenon records begin with the script base name (duration: 00m 12s) [01:16:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [01:26:54] !log Restarted conversion of support desk from LQT->Flow using convertLqtPageOnLocalWiki.php, using hhvm on mw1041 [01:27:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [01:44:15] (03CR) 10GWicke: "@Filippo, in admittedly relatively short testing spikes of >1s GC times were significantly reduced. The tests used more aggressive values " [puppet] - 10https://gerrit.wikimedia.org/r/227335 (https://phabricator.wikimedia.org/T106619) (owner: 10GWicke) [01:49:24] (03PS1) 10Ori.livneh: xenon: stop generating weekly logs; tag files with entry-point [puppet] - 10https://gerrit.wikimedia.org/r/231204 [01:52:01] (03CR) 10Ori.livneh: [C: 032] xenon: stop generating weekly logs; tag files with entry-point [puppet] - 10https://gerrit.wikimedia.org/r/231204 (owner: 10Ori.livneh) [01:52:34] RECOVERY - Apache HTTP on mw1140 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.117 second response time [01:52:34] RECOVERY - HHVM rendering on mw1140 is OK: HTTP OK: HTTP/1.1 200 OK - 66327 bytes in 0.428 second response time [01:53:13] PROBLEM - HHVM rendering on mw1155 is CRITICAL - Socket timeout after 10 seconds [01:54:54] RECOVERY - HHVM rendering on mw1155 is OK: HTTP OK: HTTP/1.1 200 OK - 66343 bytes in 0.684 second response time [01:56:54] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL 8.33% of data above the critical threshold [500.0] [02:00:03] RECOVERY - HHVM queue size on mw1140 is OK Less than 30.00% above the threshold [10.0] [02:00:04] RECOVERY - HHVM busy threads on mw1140 is OK Less than 30.00% above the threshold [57.6] [02:05:04] PROBLEM - Disk space on labstore1002 is CRITICAL: DISK CRITICAL - /run/lock/storage-replicate-labstore-tools/snapshot is not accessible: Permission denied [02:08:45] RECOVERY - HTTP 5xx req/min on graphite1001 is OK Less than 1.00% above the threshold [250.0] [02:38:44] !log l10nupdate@tin Synchronized php-1.26wmf17/cache/l10n: l10nupdate for 1.26wmf17 (duration: 10m 47s) [02:38:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:44:49] !log l10nupdate@tin LocalisationUpdate completed (1.26wmf17) at 2015-08-13 02:44:49+00:00 [02:44:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [03:01:44] !log l10nupdate@tin Synchronized php-1.26wmf18/cache/l10n: l10nupdate for 1.26wmf18 (duration: 06m 14s) [03:01:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [03:04:52] !log l10nupdate@tin LocalisationUpdate completed (1.26wmf18) at 2015-08-13 03:04:52+00:00 [03:04:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [03:08:44] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL 6.67% of data above the critical threshold [500.0] [03:24:04] RECOVERY - HTTP 5xx req/min on graphite1001 is OK Less than 1.00% above the threshold [250.0] [04:27:44] PROBLEM - puppet last run on es2009 is CRITICAL puppet fail [04:30:17] (03PS1) 10Ori.livneh: xenon: use full path when calling os.path.getctime [puppet] - 10https://gerrit.wikimedia.org/r/231209 [04:30:30] (03CR) 10Ori.livneh: [C: 032 V: 032] xenon: use full path when calling os.path.getctime [puppet] - 10https://gerrit.wikimedia.org/r/231209 (owner: 10Ori.livneh) [04:46:57] 7Blocked-on-Operations, 10MediaWiki-extensions-CentralAuth, 10Wikimedia-General-or-Unknown, 5MW-1.26-release, and 2 others: Increase "remember me" login cookie expiry from 30 days to 1 year on Wikimedia wikis - https://phabricator.wikimedia.org/T68699#1533893 (10Mattflaschen) I was assuming all, unless som... [04:54:53] RECOVERY - puppet last run on es2009 is OK Puppet is currently enabled, last run 35 seconds ago with 0 failures [05:02:39] !log ori@tin Synchronized php-1.26wmf18/includes/cache/MessageCache.php: 5f1ab59d31: MessageCache: derive the hash from the cache contents (duration: 00m 12s) [05:02:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [05:03:35] !log ori@tin Synchronized php-1.26wmf17/includes/cache/MessageCache.php: 5f1ab59d31: MessageCache: derive the hash from the cache contents (duration: 00m 12s) [05:03:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [06:01:34] PROBLEM - puppet last run on sca1001 is CRITICAL puppet fail [06:09:23] PROBLEM - Router interfaces on cr1-eqiad is CRITICAL host 208.80.154.196, interfaces up: 228, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-4/2/0: down - Core: cr1-codfw:xe-5/2/1 (Telia, IC-307235) (#2648) [10Gbps wave]BR [06:09:45] PROBLEM - puppet last run on mw2148 is CRITICAL Puppet has 1 failures [06:21:18] 7Blocked-on-Operations, 10MediaWiki-extensions-CentralAuth, 10Wikimedia-General-or-Unknown, 5MW-1.26-release, and 2 others: Increase "remember me" login cookie expiry from 30 days to 1 year on Wikimedia wikis - https://phabricator.wikimedia.org/T68699#1533970 (10ArielGlenn) Adding @BBlack to comment on the... [06:24:53] RECOVERY - Router interfaces on cr1-eqiad is OK host 208.80.154.196, interfaces up: 230, down: 0, dormant: 0, excluded: 0, unused: 0 [06:31:04] PROBLEM - puppet last run on mw2024 is CRITICAL puppet fail [06:31:05] PROBLEM - puppet last run on mc2007 is CRITICAL puppet fail [06:31:14] PROBLEM - puppet last run on cp1053 is CRITICAL Puppet has 1 failures [06:32:14] PROBLEM - puppet last run on mw1135 is CRITICAL Puppet has 1 failures [06:32:44] PROBLEM - puppet last run on mw2126 is CRITICAL Puppet has 1 failures [06:33:53] PROBLEM - puppet last run on mw2120 is CRITICAL Puppet has 1 failures [06:34:55] RECOVERY - puppet last run on mw2148 is OK Puppet is currently enabled, last run 6 seconds ago with 0 failures [06:48:16] !log Stopped Support desk LQT->Flow conversion for tonight [06:48:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [06:56:05] RECOVERY - puppet last run on mc2007 is OK Puppet is currently enabled, last run 30 seconds ago with 0 failures [06:56:13] RECOVERY - puppet last run on cp1053 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:14] RECOVERY - puppet last run on mw1135 is OK Puppet is currently enabled, last run 58 seconds ago with 0 failures [06:57:25] RECOVERY - puppet last run on sca1001 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:44] RECOVERY - puppet last run on mw2126 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:58:04] RECOVERY - puppet last run on mw2024 is OK Puppet is currently enabled, last run 21 seconds ago with 0 failures [06:58:54] RECOVERY - puppet last run on mw2120 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [07:11:14] PROBLEM - puppet last run on mw2195 is CRITICAL Puppet has 2 failures [07:11:23] PROBLEM - Router interfaces on cr1-eqiad is CRITICAL host 208.80.154.196, interfaces up: 228, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-4/2/0: down - Core: cr1-codfw:xe-5/2/1 (Telia, IC-307235) (#2648) [10Gbps wave]BR [07:15:40] (03CR) 10Smalyshev: "CORS/cookies patch merged in T108101" [puppet] - 10https://gerrit.wikimedia.org/r/229392 (https://phabricator.wikimedia.org/T107602) (owner: 10Giuseppe Lavagetto) [07:28:43] RECOVERY - Router interfaces on cr1-eqiad is OK host 208.80.154.196, interfaces up: 230, down: 0, dormant: 0, excluded: 0, unused: 0 [07:36:46] <_joe_> !log killing all gmond instances on netmon1001, trying to fix ganglia-monitor-aggregator [07:36:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [07:38:23] RECOVERY - puppet last run on mw2195 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [07:46:25] (03CR) 10Amire80: "Thank you!" [puppet] - 10https://gerrit.wikimedia.org/r/231166 (owner: 10Nemo bis) [07:52:55] PROBLEM - test icmp reachability to codfw on ripe-atlas-codfw is CRITICAL - failed 301 probes of 364 (alerts on 19) [07:59:44] PROBLEM - citoid endpoints health on sca1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:00:41] !log l10nupdate@tin ResourceLoader cache refresh completed at Thu Aug 13 08:00:40 UTC 2015 (duration 0m 39s) [08:00:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:02:04] PROBLEM - graphoid endpoints health on sca1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:04:33] PROBLEM - Check size of conntrack table on sca1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:05:04] PROBLEM - mathoid endpoints health on sca1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:05:05] PROBLEM - zotero on sca1001 is CRITICAL - Socket timeout after 10 seconds [08:05:23] PROBLEM - RAID on sca1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:05:24] PROBLEM - puppet last run on sca1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:06:34] PROBLEM - configured eth on sca1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:06:44] PROBLEM - salt-minion processes on sca1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:06:44] PROBLEM - SSH on sca1001 is CRITICAL - Socket timeout after 10 seconds [08:07:45] !log upgrade cassandra on restbase1005 [08:07:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:08:15] PROBLEM - DPKG on sca1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:08:24] RECOVERY - configured eth on sca1001 is OK - interfaces up [08:08:33] RECOVERY - salt-minion processes on sca1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [08:08:34] RECOVERY - SSH on sca1001 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2 (protocol 2.0) [08:08:54] RECOVERY - zotero on sca1001 is OK: HTTP OK: HTTP/1.0 200 OK - 62 bytes in 1.064 second response time [08:08:55] RECOVERY - mathoid endpoints health on sca1001 is OK: All endpoints are healthy [08:09:05] RECOVERY - RAID on sca1001 is OK Active: 6, Working: 6, Failed: 0, Spare: 0 [08:09:13] RECOVERY - puppet last run on sca1001 is OK Puppet is currently enabled, last run 43 minutes ago with 0 failures [08:09:24] RECOVERY - citoid endpoints health on sca1001 is OK: All endpoints are healthy [08:09:43] RECOVERY - graphoid endpoints health on sca1001 is OK: All endpoints are healthy [08:10:04] RECOVERY - DPKG on sca1001 is OK: All packages OK [08:10:04] RECOVERY - Check size of conntrack table on sca1001 is OK nf_conntrack is 1 % full [08:11:33] !log upgrade cassandra on restbase1006 [08:11:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:14:44] RECOVERY - test icmp reachability to codfw on ripe-atlas-codfw is OK - failed 4 probes of 363 (alerts on 19) [08:15:36] !log upgrade cassandra on restbase1009 [08:15:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:16:23] <_joe_> !log removing all stale aggregator configs from netmon1001 [08:16:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:17:51] 6operations, 10ContentTranslation-cxserver, 6Language-Engineering, 10MediaWiki-extensions-ContentTranslation, and 3 others: Apertium leaves a ton of stale processes, consumes all the available memory - https://phabricator.wikimedia.org/T107270#1534228 (10akosiaris) [08:22:46] 6operations, 10ContentTranslation-cxserver, 6Language-Engineering, 10MediaWiki-extensions-ContentTranslation, and 3 others: Apertium leaves a ton of stale processes, consumes all the available memory - https://phabricator.wikimedia.org/T107270#1534235 (10akosiaris) p:5Unbreak!>3High Lowering back to hi... [08:29:08] (03PS2) 10Filippo Giunchedi: update cassandra-metrics-collector version [puppet] - 10https://gerrit.wikimedia.org/r/230589 (https://phabricator.wikimedia.org/T101764) (owner: 10Eevans) [08:31:06] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] update cassandra-metrics-collector version [puppet] - 10https://gerrit.wikimedia.org/r/230589 (https://phabricator.wikimedia.org/T101764) (owner: 10Eevans) [08:33:02] 6operations, 10ContentTranslation-cxserver, 6Language-Engineering, 10MediaWiki-extensions-ContentTranslation, and 3 others: Apertium leaves a ton of stale processes, consumes all the available memory - https://phabricator.wikimedia.org/T107270#1534247 (10akosiaris) ``` root@sca1001:~# ps -u apertium -fa |... [08:45:41] 6operations, 10ContentTranslation-cxserver, 6Language-Engineering, 10MediaWiki-extensions-ContentTranslation, and 3 others: Apertium leaves a ton of stale processes, consumes all the available memory - https://phabricator.wikimedia.org/T107270#1534250 (10akosiaris) Some more info ``` akosiaris@sca1001:/va... [08:46:29] kart_: ^ [08:46:44] something weird is going on with apertium language pair shutting down [08:48:49] (03PS3) 10Giuseppe Lavagetto: ganglia-monitor-aggregator: fix upstart script [puppet] - 10https://gerrit.wikimedia.org/r/228805 [08:52:10] (03PS6) 10Filippo Giunchedi: diamond: service stats puppet integration [puppet] - 10https://gerrit.wikimedia.org/r/224094 [08:52:12] (03PS6) 10Filippo Giunchedi: diamond: add upstart/systemd service stats [puppet] - 10https://gerrit.wikimedia.org/r/224093 [08:52:21] 6operations, 10ContentTranslation-cxserver, 6Language-Engineering, 10MediaWiki-extensions-ContentTranslation, and 3 others: Apertium leaves a ton of stale processes, consumes all the available memory - https://phabricator.wikimedia.org/T107270#1534265 (10Unhammer) :-( we have 526 procs for our user, runnin... [08:52:23] (03CR) 10Filippo Giunchedi: diamond: add upstart/systemd service stats (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/224093 (owner: 10Filippo Giunchedi) [08:58:10] (03PS1) 10Faidon Liambotis: swift: roll role::swift::icehouse into swift.pp [puppet] - 10https://gerrit.wikimedia.org/r/231231 [08:58:12] (03PS1) 10Faidon Liambotis: swift: remove support for Ganglia stats [puppet] - 10https://gerrit.wikimedia.org/r/231232 [08:58:14] (03PS1) 10Faidon Liambotis: swift_new: add precise support [puppet] - 10https://gerrit.wikimedia.org/r/231233 [08:58:16] (03PS1) 10Faidon Liambotis: swift: reduce the delta with swift_new [puppet] - 10https://gerrit.wikimedia.org/r/231234 [08:58:18] (03PS1) 10Faidon Liambotis: swift_new: add hiera data for eqiad/esams [puppet] - 10https://gerrit.wikimedia.org/r/231235 [08:58:20] (03PS1) 10Faidon Liambotis: Switch ms-fe/ms-be esams to swift_new [puppet] - 10https://gerrit.wikimedia.org/r/231236 [08:58:22] (03PS1) 10Faidon Liambotis: Switch ms-fe/ms-be eqiad to swift_new [puppet] - 10https://gerrit.wikimedia.org/r/231237 [08:58:24] (03PS1) 10Faidon Liambotis: Kill role::swift::labs [puppet] - 10https://gerrit.wikimedia.org/r/231238 [08:58:26] (03PS1) 10Faidon Liambotis: Kill swift.pp [puppet] - 10https://gerrit.wikimedia.org/r/231239 [08:58:28] (03PS1) 10Faidon Liambotis: Rename swift_new to swift [puppet] - 10https://gerrit.wikimedia.org/r/231240 [08:58:45] <_joe_> paravoid: \o/ [08:59:34] made offline, on the plane [08:59:39] completely untested, possibly broken [08:59:45] \o/ it is gerrit time! [09:00:13] * godog going to blindly merge to make the day interesting [09:02:41] <_joe_> godog: not blindly, we have the compiler [09:05:04] true that, but it takes away the thrill! [09:10:33] 6operations, 10ContentTranslation-cxserver, 6Language-Engineering, 10MediaWiki-extensions-ContentTranslation, and 3 others: Apertium leaves a ton of stale processes, consumes all the available memory - https://phabricator.wikimedia.org/T107270#1534267 (10KartikMistry) @Unhammer - no we're not using '-f' op... [09:11:06] akosiaris: sorry, was in meeting. [09:13:01] 6operations, 10RESTBase-Cassandra: upgrade RESTBase cluster to Cassandra 2.1.8 - https://phabricator.wikimedia.org/T107949#1534269 (10fgiunchedi) 5Open>3Resolved this is completed [09:13:08] akosiaris: apertium-apy not running in sca1002? [09:18:05] kart_: nope [09:18:44] kart_: to be more precise, it is running, it is not actively accepting requests [09:19:13] Bug? [09:19:19] Or is it okay? [09:21:05] 6operations, 10ContentTranslation-cxserver, 6Language-Engineering, 10MediaWiki-extensions-ContentTranslation, and 3 others: Apertium leaves a ton of stale processes, consumes all the available memory - https://phabricator.wikimedia.org/T107270#1534275 (10akosiaris) That one did not return anything. Removi... [09:21:44] kart_: no, that's by design right now [09:22:25] kart_: but while moving to jessie and scb we will have it accepting connections at both boxes [09:25:48] (03PS7) 10Filippo Giunchedi: diamond: service stats puppet integration [puppet] - 10https://gerrit.wikimedia.org/r/224094 (https://phabricator.wikimedia.org/T108027) [09:25:50] (03PS7) 10Filippo Giunchedi: diamond: add upstart/systemd service stats [puppet] - 10https://gerrit.wikimedia.org/r/224093 (https://phabricator.wikimedia.org/T108027) [09:31:24] PROBLEM - puppet last run on sca1001 is CRITICAL puppet fail [09:32:08] (03PS4) 10Alexandros Kosiaris: maps: Add usernames/passwords to kartotherian config [puppet] - 10https://gerrit.wikimedia.org/r/230549 (https://phabricator.wikimedia.org/T108610) (owner: 10Yurik) [09:35:05] RECOVERY - puppet last run on sca1001 is OK Puppet is currently enabled, last run 34 seconds ago with 0 failures [09:35:13] OOM came out and shot various apertium processes (as well as puppet on sca1001) [09:35:26] !log restarting apertium on sca1001 [09:35:37] we can safely assume it will not be working very well now [09:36:03] just stopped it... 50GB of memory freed [09:36:06] _joe_: ^ [09:36:11] <_joe_> akosiaris: like last time [09:36:41] <_joe_> ok now we really need to either stop apertium indefinitely or to partition it across different clusters for different languages [09:36:47] <_joe_> while someone fixes it of course [09:39:08] well, stopping it is not really an option [09:39:30] unless we don't come up with anything [09:39:54] kart_: ^ [09:42:35] PROBLEM - puppet last run on netmon1001 is CRITICAL puppet fail [09:44:36] 6operations, 10ContentTranslation-cxserver, 6Language-Engineering, 10MediaWiki-extensions-ContentTranslation, and 3 others: Apertium leaves a ton of stale processes, consumes all the available memory - https://phabricator.wikimedia.org/T107270#1534314 (10Unhammer) A simple grep for nno-nob will also show p... [09:46:25] RECOVERY - puppet last run on netmon1001 is OK Puppet is currently enabled, last run 44 seconds ago with 0 failures [09:52:18] 6operations, 10ContentTranslation-cxserver, 6Language-Engineering, 10MediaWiki-extensions-ContentTranslation, and 3 others: Apertium leaves a ton of stale processes, consumes all the available memory - https://phabricator.wikimedia.org/T107270#1534330 (10Unhammer) Oh, that is odd. It should not be starting... [09:55:51] 6operations, 10ContentTranslation-cxserver, 6Language-Engineering, 10MediaWiki-extensions-ContentTranslation, and 3 others: Apertium leaves a ton of stale processes, consumes all the available memory - https://phabricator.wikimedia.org/T107270#1534338 (10Unhammer) Wait, what version of apy is this? ```... [09:58:43] (03PS5) 10Alexandros Kosiaris: maps: Add usernames/passwords to kartotherian config [puppet] - 10https://gerrit.wikimedia.org/r/230549 (https://phabricator.wikimedia.org/T108610) (owner: 10Yurik) [10:05:56] 6operations, 10ContentTranslation-cxserver, 6Language-Engineering, 10MediaWiki-extensions-ContentTranslation, and 3 others: Apertium leaves a ton of stale processes, consumes all the available memory - https://phabricator.wikimedia.org/T107270#1534346 (10akosiaris) >>! In T107270#1534338, @Unhammer wrote:... [10:08:55] 6operations, 10ContentTranslation-cxserver, 6Language-Engineering, 10MediaWiki-extensions-ContentTranslation, and 3 others: Apertium leaves a ton of stale processes, consumes all the available memory - https://phabricator.wikimedia.org/T107270#1534357 (10akosiaris) >>! In T107270#1534314, @Unhammer wrote:... [10:13:23] PROBLEM - RAID on es1005 is CRITICAL 1 failed LD(s) (Degraded) [10:13:47] <_joe_> ouch [10:13:52] <_joe_> jynus: ^^ [10:15:20] (03PS4) 10Giuseppe Lavagetto: ganglia-monitor-aggregator: fix upstart script [puppet] - 10https://gerrit.wikimedia.org/r/228805 [10:17:38] thanks, _joe_ [10:17:45] not a huge issue [10:19:45] akosiaris, this won't work - we constantly will be creating new keyspaces - https://gerrit.wikimedia.org/r/#/c/230549/5/templates/maps/grants.cql.erb [10:20:02] (03CR) 10Alexandros Kosiaris: [C: 032] maps: Add usernames/passwords to kartotherian config [puppet] - 10https://gerrit.wikimedia.org/r/230549 (https://phabricator.wikimedia.org/T108610) (owner: 10Yurik) [10:20:11] lol [10:20:46] is there a way to generify this - just like we did with postgress - so that all new keyspaces get the same rights? [10:21:06] yurik: oh, I did see that comment [10:21:10] sorry [10:21:15] I didn't [10:21:26] constantly be creating new keyspaces ? [10:21:27] np, it was there about 10 seconds before the CR [10:22:27] yes, keyspace is how we separate each style from one another. I could of course make them all part of the same keyspace... [10:22:36] and use different tables [10:22:54] 6operations, 10ContentTranslation-cxserver, 6Language-Engineering, 10MediaWiki-extensions-ContentTranslation, and 3 others: Apertium leaves a ton of stale processes, consumes all the available memory - https://phabricator.wikimedia.org/T107270#1534390 (10Unhammer) -r57689 is almost a year old and used thre... [10:23:43] in which case we should probably rename v1 into something like 'maps' [10:24:15] yurik: not sure of the repercussions of this tbh [10:25:24] 6operations, 10hardware-requests, 7Database: new external storage cluster(s) - https://phabricator.wikimedia.org/T105843#1534404 (10jcrespo) Hardware has already been ordered for eqiad. [10:25:26] akosiaris, neither do i - i mean, both services could easily use a table name as a config param, so that multiple styles are all stored together. Btw, i'm about to delete v1 -- generating v2 [10:25:37] (03Abandoned) 10Alex Monk: Reduce string URLs to defined constant [mediawiki-config] - 10https://gerrit.wikimedia.org/r/131914 (https://bugzilla.wikimedia.org/48618) (owner: 10Withoutaname) [10:26:25] yurik: yeah, but that may (or may not) lead to performance issues ? not sure [10:26:31] akosiaris, lets keep all the default rights for cassandra until we have a better idea of consequences [10:26:47] so I should update this to v2 ? [10:27:03] akosiaris, better remove the rights at all until we are sure [10:27:12] and store 'casandra' as the password in the private hive [10:28:01] otherwise every time we decide to experiment with a new data style, we will have to change puppets - not so good [10:28:10] normally that would be the part where I rant about using a datastore we are not familiar with [10:28:22] * yurik bounces that to gwicke [10:28:36] :D [10:28:45] yeah, I guessed that would happen [10:29:02] every system has its funky moments, and you don't usually hit most of them [10:29:26] problem is not the system, it's the lack of knowledge [10:29:43] we are in the process of gaining it :D [10:29:50] anyway, tell you what, I 'll update that grants file to do GRANT ALL permission on all keyspaces [10:30:05] always.. not sure it's worth it though [10:30:18] I haven't exactly seen anything stellar from that datastore yet [10:30:24] akosiaris, is there a way to grant READONLY permissions on ALL keyspaces? [10:30:28] yes [10:30:39] GRANT SELECT on ALL KEYSPACES [10:30:49] ah, but not the new ones [10:30:54] lemme test it but according to the documentation it should work [10:31:04] 6operations, 10hardware-requests, 7Database: new external storage cluster(s) - https://phabricator.wikimedia.org/T105843#1534434 (10jcrespo) es1005 RAID degraded. We will not replace the failed disk as full server replacements are on its way in ``` Device Present ===========... [10:31:08] unless puppet keeps rerunning ? [10:31:37] in which case the moment tilerator (which should have all rights) creates a new keyspace, within half an hour kartotherian should pick it up [10:31:48] list all on all keyspaces of kartotherian ; [10:31:48] username | resource | permission [10:31:48] --------------+-----------------+------------ [10:31:48] kartotherian | | SELECT [10:31:54] no, it indeed does all keyspaces [10:32:01] or so it seems [10:32:04] so that would work [10:32:12] of course we should try it [10:32:17] so if i create a new keyspace right now, it will auto-grant it? [10:32:20] but it does seem promising [10:32:27] seems like it [10:32:29] (03PS5) 10Giuseppe Lavagetto: ganglia-monitor-aggregator: fix upstart script [puppet] - 10https://gerrit.wikimedia.org/r/228805 [10:32:50] lemme update that script, do it and see what's going on [10:32:52] meh, i so wish we did redis before all this so that i could finally stop running it like a true bumm (in screen) [10:32:57] thanks! [10:32:58] :) [10:33:13] but its a nonblocker now, so i shouldn't complain :) [10:33:40] redis as a user program in screen with no permissions rulez! [10:33:55] (03PS6) 10Giuseppe Lavagetto: ganglia-monitor-aggregator: fix upstart script [puppet] - 10https://gerrit.wikimedia.org/r/228805 [10:33:56] also, there are some minor issues about an admin account [10:34:04] I 'd rather we did not use cassandra superuser [10:34:08] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] ganglia-monitor-aggregator: fix upstart script [puppet] - 10https://gerrit.wikimedia.org/r/228805 (owner: 10Giuseppe Lavagetto) [10:34:08] but that's for later [10:34:18] I did leave a TODO item though in the script [10:34:21] sure, any other name will do... as long as its short :) [10:34:23] <_joe_> so postgres, cassandra, redis [10:34:35] <_joe_> don't we need some other service there? [10:34:43] <_joe_> it seems not complex enough [10:34:44] tilerator [10:35:10] and nothing else. Like imagemagik, etc [10:35:31] es1005 has 1 failed disk and 4 disks with errors [10:36:16] (03PS1) 10Alexandros Kosiaris: maps: GRANT on all KEYSPACES instead of specific ones [puppet] - 10https://gerrit.wikimedia.org/r/231247 [10:36:36] ACKNOWLEDGEMENT - RAID on es1005 is CRITICAL 1 failed LD(s) (Degraded) Jcrespo Several disks degraded, waiting for full server replacement T105843#1534434 [10:36:55] maybe we shouldn't wait so much for a full replacement :-) [10:37:00] akosiaris, i found new disks for us [10:37:00] next time [10:37:00] http://hardware.slashdot.org/story/15/08/12/2148246/samsung-unveils-v-nand-high-performance-ssds-fast-nvme-card-at-55gb-per-second [10:37:11] ahaha [10:37:13] nvme ? [10:37:14] lol [10:37:22] 6.4TB!!! [10:38:10] is nvme bad? [10:38:21] (03CR) 10Alexandros Kosiaris: [C: 032] maps: GRANT on all KEYSPACES instead of specific ones [puppet] - 10https://gerrit.wikimedia.org/r/231247 (owner: 10Alexandros Kosiaris) [10:38:43] raiser cards forevah, who needs those pesky IDE/SATA controllers [10:41:40] yurik: ok done [10:41:46] let's see now if it works [10:42:01] akosiaris, thanks! btw, i noticed that when i stop kartotherian, it auto-restarts [10:42:41] which is really really bad for tilerator - it may mess up the job que by taking jobs, crashing, and leaving them hanging. i would need to manually cleanup afterwards [10:43:05] is there a way to ensure that the service stays "down" [10:43:19] i tried disabling, but it said i have no rights [10:43:21] how did you stop it ? [10:43:29] sudo service ... stop [10:43:48] sudo service .. disable [10:43:52] oh wait [10:43:52] no [10:43:59] that's systemctl talk [10:43:59] try it with kartotherian btw [10:44:21] i tried the disable command (don't remember what it was), and it gave me a perm error [10:45:14] tricky one, lemme think a bit about it [10:45:56] so, kartotherian should be working with the new setup now, right ? [10:46:17] it's own cassandra/postgresql user [10:46:27] akosiaris, i would guess so - feel free to kill/restart it [10:46:34] noone's using it [10:46:41] yurik: how do we test it's using the DB fine ? zoom level > 15 ? [10:50:13] PROBLEM - puppet last run on db2054 is CRITICAL puppet fail [10:50:29] akosiaris, DB fine? [10:50:38] yurik: seems like it [10:50:43] i'm confused [10:50:50] oh sorry [10:51:06] I meant that kartotherian is using the postgres without any problems [10:51:36] testing would be fairly easy - 1) ssh to one of the machines with -L to port 4000 [10:51:51] 2) need to switch the sources.yaml to use a postgress [10:52:21] oh, silly me sources.yaml does not mention a postgres [10:52:31] ok. username/password though is there indeed [10:52:47] akosiaris, take a look at the sources.prod2.yaml - that's what i will use [10:53:20] (03PS23) 10Paladox: Rename all main WikimediaIncubator settings to have a wg prefix [mediawiki-config] - 10https://gerrit.wikimedia.org/r/207909 [10:53:30] akosiaris, i must leave for a few hours, lets continue this eve? [10:53:53] i will set it up to use proper things :) [10:54:06] ok cool [10:54:18] thx for everything!!! [10:55:15] 7Blocked-on-Operations, 6operations, 6Discovery, 10Maps, 3Discovery-Maps-Sprint: Add Redis to maps cluster - https://phabricator.wikimedia.org/T107813#1534505 (10akosiaris) [10:55:17] 6operations, 6Discovery, 10Maps, 6Services, and 2 others: Puppetize Tilerator for deployment - https://phabricator.wikimedia.org/T105074#1534504 (10akosiaris) [10:58:00] (03CR) 10Paladox: "This shoulden be blocked because WikimediIncubation should be following mediawiki coding standards which say configs should be prefixed li" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/207909 (owner: 10Paladox) [10:58:10] (03PS24) 10Paladox: Rename all main WikimediaIncubator settings to have a wg prefix [mediawiki-config] - 10https://gerrit.wikimedia.org/r/207909 [11:00:41] 6operations, 10ContentTranslation-cxserver, 6Language-Engineering, 10MediaWiki-extensions-ContentTranslation, and 3 others: Apertium leaves a ton of stale processes, consumes all the available memory - https://phabricator.wikimedia.org/T107270#1534538 (10akosiaris) >>! In T107270#1534390, @Unhammer wrote:... [11:13:45] akosiaris: I'll prepare new package. There are some cavets, I'll email. [11:14:40] 6operations, 10ContentTranslation-cxserver, 6Language-Engineering, 10MediaWiki-extensions-ContentTranslation, and 3 others: Apertium leaves a ton of stale processes, consumes all the available memory - https://phabricator.wikimedia.org/T107270#1534560 (10Unhammer) 4.2 introduced the locks that we currently... [11:14:55] kart_: tornado is the big one from what I see [11:15:05] yep [11:15:54] RECOVERY - puppet last run on db2054 is OK Puppet is currently enabled, last run 1 second ago with 0 failures [11:26:44] PROBLEM - puppet last run on netmon1001 is CRITICAL puppet fail [11:37:54] <_joe_> that ^^ is on me, I forgot to disable puppet there [11:39:35] (03PS1) 10Giuseppe Lavagetto: ganglia::monitor::aggregator: fix upstart (take 2) [puppet] - 10https://gerrit.wikimedia.org/r/231251 [11:48:13] PROBLEM - puppet last run on cp3018 is CRITICAL puppet fail [11:51:17] 10Ops-Access-Requests, 6operations, 7LDAP: Add WMF engineer VolkerE to ldap/wmf group - https://phabricator.wikimedia.org/T107985#1534594 (10Volker_E) Thanks @SPage and all that made that happen quickly. [11:54:03] PROBLEM - HTTP on silver is CRITICAL - Socket timeout after 10 seconds [11:55:53] RECOVERY - HTTP on silver is OK: HTTP OK: HTTP/1.1 302 Found - 418 bytes in 0.985 second response time [12:00:52] (03CR) 10Giuseppe Lavagetto: [C: 032] ganglia::monitor::aggregator: fix upstart (take 2) [puppet] - 10https://gerrit.wikimedia.org/r/231251 (owner: 10Giuseppe Lavagetto) [12:04:04] 6operations, 10ContentTranslation-cxserver, 6Language-Engineering, 10MediaWiki-extensions-ContentTranslation, and 3 others: Apertium leaves a ton of stale processes, consumes all the available memory - https://phabricator.wikimedia.org/T107270#1534600 (10akosiaris) So hard dependency. I was afraid of that... [12:07:04] (03PS5) 10Alexandros Kosiaris: Introducing mobileapps role and puppet module [puppet] - 10https://gerrit.wikimedia.org/r/230788 (https://phabricator.wikimedia.org/T105538) [12:07:12] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] Introducing mobileapps role and puppet module [puppet] - 10https://gerrit.wikimedia.org/r/230788 (https://phabricator.wikimedia.org/T105538) (owner: 10Alexandros Kosiaris) [12:14:17] (03PS3) 10Alexandros Kosiaris: Assign mobileapps service to sca cluster [puppet] - 10https://gerrit.wikimedia.org/r/230789 (https://phabricator.wikimedia.org/T105538) [12:14:23] (03CR) 10jenkins-bot: [V: 04-1] Assign mobileapps service to sca cluster [puppet] - 10https://gerrit.wikimedia.org/r/230789 (https://phabricator.wikimedia.org/T105538) (owner: 10Alexandros Kosiaris) [12:15:44] RECOVERY - puppet last run on cp3018 is OK Puppet is currently enabled, last run 52 seconds ago with 0 failures [12:22:34] (03PS1) 10Alexandros Kosiaris: new_wmf_service: Use shallow copies for group derivation [puppet] - 10https://gerrit.wikimedia.org/r/231256 [12:25:44] PROBLEM - puppet last run on netmon1001 is CRITICAL puppet fail [12:27:43] RECOVERY - puppet last run on netmon1001 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [12:28:24] (03PS1) 10Florianschmidtwelzow: Remove unused config options [mediawiki-config] - 10https://gerrit.wikimedia.org/r/231257 (https://phabricator.wikimedia.org/T108936) [12:30:56] (03PS2) 10Florianschmidtwelzow: Remove unused config options [mediawiki-config] - 10https://gerrit.wikimedia.org/r/231257 (https://phabricator.wikimedia.org/T108936) [12:34:31] (03PS2) 10Alexandros Kosiaris: new_wmf_service: Use shallow copies for group derivation [puppet] - 10https://gerrit.wikimedia.org/r/231256 [12:36:14] 7Blocked-on-Operations, 10MediaWiki-extensions-CentralAuth, 10Wikimedia-General-or-Unknown, 5MW-1.26-release, and 2 others: Increase "remember me" login cookie expiry from 30 days to 1 year on Wikimedia wikis - https://phabricator.wikimedia.org/T68699#1534649 (10BBlack) As there's no change to the name or... [12:37:02] 6operations, 10ContentTranslation-cxserver, 6Language-Engineering, 10MediaWiki-extensions-ContentTranslation, and 3 others: Apertium leaves a ton of stale processes, consumes all the available memory - https://phabricator.wikimedia.org/T107270#1534650 (10Unhammer) Tried making it work with tornado 3; apy -... [12:37:24] PROBLEM - HTTP on silver is CRITICAL - Socket timeout after 10 seconds [12:39:00] (03CR) 10BBlack: "I'll rebase this, as I made some invasive changes to the affected manifest since it was originally updated." [puppet] - 10https://gerrit.wikimedia.org/r/229392 (https://phabricator.wikimedia.org/T107602) (owner: 10Giuseppe Lavagetto) [12:39:13] RECOVERY - HTTP on silver is OK: HTTP OK: HTTP/1.1 302 Found - 418 bytes in 0.363 second response time [12:39:22] !log Restarted apache2 on silver [12:39:45] akosiaris: Look like we're better with apertium-apy now. [12:41:35] andrewbogott: Coren: I think you (or someone else) should have a look at keystone [12:41:50] hoo: looking, thanks [12:42:35] hoo: Ninja'ed [12:42:49] !log restarted keystone on labcontrol1001 [12:42:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:43:17] (03PS4) 10BBlack: wikidata query: add misc-web configuration [puppet] - 10https://gerrit.wikimedia.org/r/229392 (https://phabricator.wikimedia.org/T107602) (owner: 10Giuseppe Lavagetto) [12:46:19] (03PS5) 10BBlack: wikidata query: add misc-web configuration [puppet] - 10https://gerrit.wikimedia.org/r/229392 (https://phabricator.wikimedia.org/T107602) (owner: 10Giuseppe Lavagetto) [12:47:06] (03CR) 10BBlack: "PS4 rebased onto the new misc structure, etc." [puppet] - 10https://gerrit.wikimedia.org/r/229392 (https://phabricator.wikimedia.org/T107602) (owner: 10Giuseppe Lavagetto) [12:47:48] (03CR) 10BBlack: [C: 031] wikidata query: add misc-web configuration [puppet] - 10https://gerrit.wikimedia.org/r/229392 (https://phabricator.wikimedia.org/T107602) (owner: 10Giuseppe Lavagetto) [12:48:13] (03PS1) 10Alexandros Kosiaris: Assign mobileapps service to scb cluster [puppet] - 10https://gerrit.wikimedia.org/r/231261 (https://phabricator.wikimedia.org/T105538) [12:48:58] kart_: early to tell [12:49:08] but it has already spawned 277 processes [12:49:31] _joe_: any big hiera changes while I was asleep? I’m seeing "Error 400 on SERVER: Reading data from Tools failed: NoMethodError: undefined method `[]' for nil:NilClass at /etc/puppet/manifests/realm.pp:65” — pretty sure that’s a hiera lookup failing. [12:49:36] akosiaris: new package coming up, hope that fix will fix that. [12:49:52] <_joe_> andrewbogott: nto from me for sure [12:50:04] <_joe_> let's see realm.pp [12:51:39] (03Abandoned) 10Alexandros Kosiaris: Assign mobileapps service to sca cluster [puppet] - 10https://gerrit.wikimedia.org/r/230789 (https://phabricator.wikimedia.org/T105538) (owner: 10Alexandros Kosiaris) [12:51:54] _joe_ it's $nameservers = [ ipresolve(hiera('labs_recursor'),4) ] [12:52:19] <_joe_> andrewbogott: so labs_recursor is not defined in hiera, probably? [12:52:26] <_joe_> I have no idea about all that [12:52:29] maybe… looking. [12:52:36] It certainly was defined when I went to bed :) [12:52:40] <_joe_> and I guess that is an ipresolve error, not a hiera error [12:53:13] it’s defined in labs.yaml:labs_recursor: "labs-recursor0.wikimedia.org" [13:11:01] !log graceful’d apache2 on labcontrol1001 [13:11:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:11:19] (03PS1) 10Alexandros Kosiaris: Remove UTF8 character from nikerabbit realname [puppet] - 10https://gerrit.wikimedia.org/r/231267 [13:12:28] (03CR) 10Alexandros Kosiaris: [C: 032] Remove UTF8 character from nikerabbit realname [puppet] - 10https://gerrit.wikimedia.org/r/231267 (owner: 10Alexandros Kosiaris) [13:15:51] (03PS1) 10Alexandros Kosiaris: Make nikerabbits' realname have only ASCII characters [puppet] - 10https://gerrit.wikimedia.org/r/231268 [13:16:10] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] Make nikerabbits' realname have only ASCII characters [puppet] - 10https://gerrit.wikimedia.org/r/231268 (owner: 10Alexandros Kosiaris) [13:16:26] (03PS6) 10Giuseppe Lavagetto: wikidata query: add misc-web configuration [puppet] - 10https://gerrit.wikimedia.org/r/229392 (https://phabricator.wikimedia.org/T107602) [13:17:40] (03PS7) 10Giuseppe Lavagetto: wikidata query: add misc-web configuration [puppet] - 10https://gerrit.wikimedia.org/r/229392 (https://phabricator.wikimedia.org/T107602) [13:18:09] (03CR) 10Giuseppe Lavagetto: [C: 032] wikidata query: add misc-web configuration [puppet] - 10https://gerrit.wikimedia.org/r/229392 (https://phabricator.wikimedia.org/T107602) (owner: 10Giuseppe Lavagetto) [13:18:25] (03CR) 10Alexandros Kosiaris: [C: 032] Assign mobileapps service to scb cluster [puppet] - 10https://gerrit.wikimedia.org/r/231261 (https://phabricator.wikimedia.org/T105538) (owner: 10Alexandros Kosiaris) [13:18:31] (03PS2) 10Alexandros Kosiaris: Assign mobileapps service to scb cluster [puppet] - 10https://gerrit.wikimedia.org/r/231261 (https://phabricator.wikimedia.org/T105538) [13:18:41] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] Assign mobileapps service to scb cluster [puppet] - 10https://gerrit.wikimedia.org/r/231261 (https://phabricator.wikimedia.org/T105538) (owner: 10Alexandros Kosiaris) [13:18:44] RECOVERY - puppet last run on scb1001 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [13:18:45] <_joe_> nooo [13:18:53] RECOVERY - puppet last run on scb1002 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [13:19:02] (03PS8) 10Giuseppe Lavagetto: wikidata query: add misc-web configuration [puppet] - 10https://gerrit.wikimedia.org/r/229392 (https://phabricator.wikimedia.org/T107602) [13:19:12] (03CR) 10Giuseppe Lavagetto: [V: 032] wikidata query: add misc-web configuration [puppet] - 10https://gerrit.wikimedia.org/r/229392 (https://phabricator.wikimedia.org/T107602) (owner: 10Giuseppe Lavagetto) [13:19:43] 6operations, 10ContentTranslation-cxserver, 6Language-Engineering, 10MediaWiki-extensions-ContentTranslation, and 3 others: Apertium leaves a ton of stale processes, consumes all the available memory - https://phabricator.wikimedia.org/T107270#1534732 (10Unhammer) By the way, I just included toro.py there... [13:20:18] 6operations, 10Deployment-Systems, 10RESTBase, 6Services, 5Patch-For-Review: [Discussion] Move restbase config to Ansible (or $deploy_system in general)? - https://phabricator.wikimedia.org/T107532#1534734 (10faidon) >>! In T107532#1525074, @GWicke wrote: > - ops have not clearly stated whether they pref... [13:24:39] (03PS1) 10Alexandros Kosiaris: Include role mobileapps in role scb [puppet] - 10https://gerrit.wikimedia.org/r/231269 [13:26:40] someone executing a query from terbium to frwiki master? [13:26:48] will kill it otherwise [13:28:49] (03CR) 10Alexandros Kosiaris: [C: 032] Include role mobileapps in role scb [puppet] - 10https://gerrit.wikimedia.org/r/231269 (owner: 10Alexandros Kosiaris) [13:29:11] (03PS1) 10BBlack: Remove mwuser cookie exception [puppet] - 10https://gerrit.wikimedia.org/r/231271 [13:29:25] (03PS3) 10Alexandros Kosiaris: new_wmf_service: Use shallow copies for group derivation [puppet] - 10https://gerrit.wikimedia.org/r/231256 [13:29:32] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] new_wmf_service: Use shallow copies for group derivation [puppet] - 10https://gerrit.wikimedia.org/r/231256 (owner: 10Alexandros Kosiaris) [13:35:58] !log kill custom query hiting s6 master from terbium. Use of a slave is required. [13:36:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:42:21] 6operations, 6Labs: Investigate whether to use Debian's jessie-backports - https://phabricator.wikimedia.org/T107507#1534816 (10coren) Opsen consensus on IRC is that jessie-backports should be disabled fleet-wide and any needed package brought into jessie-wikimedia. [13:44:01] 6operations, 6Labs: Make certain that jessie-backports is disabled fleetwide. - https://phabricator.wikimedia.org/T108941#1534819 (10coren) 3NEW [13:45:00] 6operations: Make certain that jessie-backports is disabled fleetwide. - https://phabricator.wikimedia.org/T108941#1534819 (10coren) [13:45:38] (03PS2) 10BBlack: Remove mwuser cookie exception [puppet] - 10https://gerrit.wikimedia.org/r/231271 [13:46:16] (03CR) 10BBlack: [C: 032 V: 032] Remove mwuser cookie exception [puppet] - 10https://gerrit.wikimedia.org/r/231271 (owner: 10BBlack) [14:05:18] PROBLEM - mobileapps endpoints health on scb1002 is CRITICAL: Generic error: Generic connection error: HTTPConnectionPool(host=10.64.16.21, port=8888): Max retries exceeded with url: /?spec (Caused by ProtocolError(Connection aborted., error(111, Connection refused))) [14:06:27] PROBLEM - mobileapps endpoints health on scb1001 is CRITICAL: Generic error: Generic connection error: HTTPConnectionPool(host=10.64.0.16, port=8888): Max retries exceeded with url: /?spec (Caused by ProtocolError(Connection aborted., error(111, Connection refused))) [14:08:21] 6operations, 10Traffic: Refactor varnish puppet config - https://phabricator.wikimedia.org/T96847#1534963 (10BBlack) On the questions about `Vary: Cookie` and our cookie support in VCL, I think I've figured out what I was failing to understand before. The way it works is basically (or at least, is expected to... [14:14:01] 7Blocked-on-Operations, 10MediaWiki-extensions-CentralAuth, 10Wikimedia-General-or-Unknown, 5MW-1.26-release, and 2 others: Increase "remember me" login cookie expiry from 30 days to 1 year on Wikimedia wikis - https://phabricator.wikimedia.org/T68699#1534987 (10Billinghurst) @bblack makes some very valid... [14:17:27] (03PS2) 10Ottomata: [WIP] eventlogging: Add statsd_host param to the mysql consumer url [puppet] - 10https://gerrit.wikimedia.org/r/231170 (https://phabricator.wikimedia.org/T105935) (owner: 10Madhuvishy) [14:17:55] 6operations, 10Traffic: Refactor varnish puppet config - https://phabricator.wikimedia.org/T96847#1534996 (10BBlack) Assuming all of the above works correctly for all requests (and I think that's a big if; I'm trying to parse out what happens in MW to control this, but the controls seem complex and spread out)... [14:19:29] (03CR) 10Ottomata: [C: 032] [WIP] eventlogging: Add statsd_host param to the mysql consumer url [puppet] - 10https://gerrit.wikimedia.org/r/231170 (https://phabricator.wikimedia.org/T105935) (owner: 10Madhuvishy) [14:24:33] (03PS1) 10Alexandros Kosiaris: maps: Add redis to the cluster [puppet] - 10https://gerrit.wikimedia.org/r/231276 (https://phabricator.wikimedia.org/T107813) [14:28:08] (03PS2) 10Alexandros Kosiaris: maps: Add redis to the cluster [puppet] - 10https://gerrit.wikimedia.org/r/231276 (https://phabricator.wikimedia.org/T107813) [14:28:21] (03PS1) 10Alexandros Kosiaris: mobileapps: checkout_submodules = true [puppet] - 10https://gerrit.wikimedia.org/r/231278 [14:28:30] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] maps: Add redis to the cluster [puppet] - 10https://gerrit.wikimedia.org/r/231276 (https://phabricator.wikimedia.org/T107813) (owner: 10Alexandros Kosiaris) [14:30:01] (03PS2) 10Alexandros Kosiaris: mobileapps: checkout_submodules = true [puppet] - 10https://gerrit.wikimedia.org/r/231278 [14:30:13] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] mobileapps: checkout_submodules = true [puppet] - 10https://gerrit.wikimedia.org/r/231278 (owner: 10Alexandros Kosiaris) [14:32:46] (03CR) 10Nemo bis: "Is this bug filed upstream?" [puppet] - 10https://gerrit.wikimedia.org/r/231267 (owner: 10Alexandros Kosiaris) [14:33:16] (03PS1) 10KartikMistry: Updated package to 0.1+svn~61425 [debs/contenttranslation/apertium-apy] - 10https://gerrit.wikimedia.org/r/231280 [14:35:50] 6operations, 10Wikimedia-Language-setup: Rename "be-x-old" to "be-tarask" - https://phabricator.wikimedia.org/T11823#1535084 (10thiemowmde) [14:39:21] (03CR) 10Giuseppe Lavagetto: puppet-compiler: first commit (0326 comments) [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/228849 (https://phabricator.wikimedia.org/T96802) (owner: 10Giuseppe Lavagetto) [14:39:54] (03PS15) 10Giuseppe Lavagetto: puppet-compiler: first commit [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/228849 (https://phabricator.wikimedia.org/T96802) [14:41:08] <_joe_> godog: I think I addressed most of your requests, I'll probably merge this big patch now [14:42:41] akosiaris, did you do anything to the redis on 2001? [14:43:15] akosiaris, because for some reason all the data has been deleted there [14:46:18] (03Abandoned) 10Mforns: Reenable reporting for mobile-reportcard [puppet] - 10https://gerrit.wikimedia.org/r/228420 (https://phabricator.wikimedia.org/T104379) (owner: 10Mforns) [14:46:42] (03CR) 10Alex Monk: [C: 04-1] "krenair@deployment-bastion:/srv/mediawiki/php-master$ grep MFWikiDataEndpoint extensions/* -R" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/231257 (https://phabricator.wikimedia.org/T108936) (owner: 10Florianschmidtwelzow) [14:46:43] _joe_: ack thanks, I don't have time to review again today, if you want to go ahead we can revisit later too [14:48:10] <_joe_> godog: that was my point, let's puppetize the already-working host I have, let's connect that to jenkins, we can make the thing better at a leater point [14:48:46] (03CR) 10KartikMistry: [C: 04-1] "Nope. This is FTBFS due to my silly mistakes. Fixing." [debs/contenttranslation/apertium-apy] - 10https://gerrit.wikimedia.org/r/231280 (owner: 10KartikMistry) [14:50:31] akosiaris, ok, it seems you have deployed proper redis on 2001 (thx!), and shut down the current instance. But I will need data migrated, so that I don't have to redo last 2 days of rendering - per http://stackoverflow.com/a/22024286/177275 -- the file is in /home/yurik/redis [14:50:33] (03CR) 10Florianschmidtwelzow: "Uhh, that's really bad, it was removed in https://gerrit.wikimedia.org/r/#/c/231192/1/includes/config/Wikidata.php / https://gerrit.wikime" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/231257 (https://phabricator.wikimedia.org/T108936) (owner: 10Florianschmidtwelzow) [14:55:44] 6operations, 6Security-Team: can we get rid of rsvg security patch? - https://phabricator.wikimedia.org/T104147#1535187 (10Aklapper) > Newer librsvg supports a sane security model by default Does anyone know which version number resembles "newer" or has some upstream bug ID reference? Sigh... For those who wa... [14:57:18] (03PS1) 10Dzahn: mediawiki: include font packages on all appservers [puppet] - 10https://gerrit.wikimedia.org/r/231284 (https://phabricator.wikimedia.org/T84777) [14:59:12] 6operations, 10RESTBase, 10RESTBase-Cassandra: Test multiple Cassandra instances per hardware node - https://phabricator.wikimedia.org/T95253#1535221 (10fgiunchedi) also multiple instances means we'll need to adapt `cassandra-metrics-collector` to discover and work with those, I think @eevans might speak to... [15:00:04] anomie ostriches thcipriani marktraceur Krenair: Respected human, time to deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150813T1500). Please do the needful. [15:00:04] kart_: A patch you scheduled for Morning SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [15:00:27] I can SWAT. kart_ ping! [15:03:29] thcipriani: pong [15:03:38] thcipriani: sorry :/ [15:03:45] kart_: np [15:04:07] !log rebooting labvirt1002 [15:04:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:05:15] 6operations, 6Labs, 3Labs-Sprint-107, 3Labs-Sprint-108, and 2 others: Investigate kernel issues on labvirt** hosts - https://phabricator.wikimedia.org/T99738#1535250 (10coren) [15:05:22] yo kart_ [15:05:52] huh, no bot today [15:06:47] kart_: readying wmf18 patch now, FYI [15:06:56] Okay :) [15:06:59] thcipriani: I see it :) [15:07:03] (the bot) [15:07:12] must be sleepbot [15:07:16] PROBLEM - Host labvirt1002 is DOWN: PING CRITICAL - Packet loss = 100% [15:08:53] 6operations, 10Deployment-Systems, 10RESTBase, 6Services, 5Patch-For-Review: [Discussion] Move restbase config to Ansible (or $deploy_system in general)? - https://phabricator.wikimedia.org/T107532#1535275 (10GWicke) Faidon, thanks for the clarification on puppet vs. other options. > that is not particu... [15:10:43] !log thcipriani@tin Synchronized php-1.26wmf18/extensions/ContentTranslation/modules/tools/ext.cx.tools.images.js: SWAT: Images: validate image id before adapting to prevent js error [[gerrit:231230]] (duration: 00m 12s) [15:10:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:10:49] ^ kart_ check please [15:11:38] aharoni: ^^ [15:11:53] kart_: production, right? [15:11:57] RECOVERY - Host labvirt1002 is UPING OK - Packet loss = 0%, RTA = 2.38 ms [15:12:04] yep. wmf18 ie testwiki [15:12:18] I don't call testwiki production. [15:12:24] ah :) [15:12:29] then wait for wmf17 [15:12:33] thcipriani: go ahead. [15:12:48] kart_: for now, test.wikipedia.org ? [15:13:01] any sudo ops around who can do a good deed? :) need to stop a service, move a file, and start it again [15:13:20] 6operations, 10RESTBase, 10RESTBase-Cassandra: Cassandra internode TLS encryption - https://phabricator.wikimedia.org/T108953#1535288 (10Eevans) 3NEW a:3Eevans [15:13:20] aharoni: also, mediawiki.org, wikibooks, etc. Pretty much everything but wikipedia [15:14:00] thcipriani: ContentTranslation is meaningless anywhere except Wikipedia (at least for the next few months) [15:14:06] kk [15:14:15] 10Ops-Access-Requests, 6operations, 6Discovery, 10Maps, 3Discovery-Maps-Sprint: Grant sudo on map-tests200* for maps team - https://phabricator.wikimedia.org/T106637#1535296 (10Yurik) [15:15:12] aharoni: I would check if CX is still working in test.wp after deployment :) [15:15:21] but that's fine. We don't break that much. [15:15:56] PROBLEM - Host labvirt1002 is DOWN: PING CRITICAL - Packet loss = 100% [15:16:12] +2'd the wmf17 change, waiting on merge and then I'll deploy (darn sleepy bot) [15:16:32] (03CR) 10KartikMistry: "Tags fixed. Should be fine now." [debs/contenttranslation/apertium-apy] - 10https://gerrit.wikimedia.org/r/231280 (owner: 10KartikMistry) [15:18:02] 6operations, 10RESTBase, 10RESTBase-Cassandra: Cassandra internode TLS encryption - https://phabricator.wikimedia.org/T108953#1535313 (10Joe) @Eevans a few questions: - how should the certs be created (I mean, what should be the name on the cert)? - is there some limitation/ reccomendation on key lengths, e... [15:18:27] RECOVERY - Host labvirt1002 is UPING OK - Packet loss = 0%, RTA = 1.97 ms [15:18:32] jynus, ping [15:18:51] hey, yurik [15:19:42] (03PS2) 10KartikMistry: Updated package to 0.1+svn~61425 [debs/contenttranslation/apertium-apy] - 10https://gerrit.wikimedia.org/r/231280 [15:20:43] !log thcipriani@tin Synchronized php-1.26wmf17/extensions/ContentTranslation/modules/tools/ext.cx.tools.images.js: SWAT: Images: validate image id before adapting to prevent js error [[gerrit:231229]] (duration: 00m 11s) [15:20:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:20:52] ^ kart_ aharoni check please :) [15:20:58] jynus, hi! could you do me a quick favor - on maps-test2001, stop redis, move the /var/lib/redis/dump.rdb into dump.rdb.bak, and cp /home/yurik/redis/dump.rdb into its place? [15:21:09] thcipriani, kart_ - now the real production? [15:21:19] aharoni: yup [15:21:23] looking [15:22:29] thcipriani: when I click the link from Special:Version, gitblit shows me a pretty old revision [15:22:39] https://git.wikimedia.org/tree/mediawiki%2Fextensions%2FContentTranslation.git/f1469a6f07c5f00e41177cb5d3c419e2bb477ccd [15:22:43] is that right? [15:23:03] jynus, akosiaris deployed proper redis, but the data was in my user instance [15:23:18] aharoni: I'm pretty sure there's a phab ticket filed for that, the code should definitely be synced [15:23:37] 6operations, 6Labs, 3Labs-Sprint-107, 3Labs-Sprint-108, and 2 others: Investigate kernel issues on labvirt** hosts - https://phabricator.wikimedia.org/T99738#1535333 (10Andrew) labvirt1001, 1002, 1009 done. [15:23:41] that sucks a lot. [15:23:47] OK, trying... [15:23:54] I am having some lag issues, yurik, give some seconds [15:23:56] aharoni: Just try :) [15:24:26] 6operations, 10Deployment-Systems, 10RESTBase, 6Services, 5Patch-For-Review: [Discussion] Move restbase config to Ansible (or $deploy_system in general)? - https://phabricator.wikimedia.org/T107532#1535341 (10mmodell) @gwicke: I think I can speak for #releng here: We do not have any desire to block ansib... [15:25:40] aharoni: thcipriani works for me (autosave as per test plan) [15:26:30] kart_: thanks! [15:28:17] yurik, done [15:28:24] jynus, thx! [15:29:58] thcipriani: kart_ - all good. CX SWAT done? [15:30:53] aharoni: cool, thanks. Should be done. [15:31:12] thcipriani, kart_ - grazie [15:33:52] aharoni: (y) [15:36:37] (03CR) 10Joal: [C: 031] "LGTM !" [puppet] - 10https://gerrit.wikimedia.org/r/230825 (https://phabricator.wikimedia.org/T108339) (owner: 10Mforns) [15:38:36] (03CR) 10Dzahn: [C: 031] "looks good, on dbstore1001 i only see 3306 open (besides standard stuff). the 3307 is not in use here but DBAs wanted it so that's fine. t" [puppet] - 10https://gerrit.wikimedia.org/r/228237 (https://phabricator.wikimedia.org/T104699) (owner: 10Muehlenhoff) [15:41:01] 6operations, 6Security-Team: can we get rid of rsvg security patch? - https://phabricator.wikimedia.org/T104147#1535406 (10Krenair) >>! In T104147#1535187, @Aklapper wrote: >> Newer librsvg supports a sane security model by default > > Does anyone know which version number resembles "newer" or has some upstre... [15:41:12] (03CR) 10Dzahn: [C: 031] "on dbproxy1001, there is only haproxy on 3306. the 3307 port is not needed but was requested by DBAs so that's fine. the exception for neo" [puppet] - 10https://gerrit.wikimedia.org/r/228239 (https://phabricator.wikimedia.org/T104699) (owner: 10Muehlenhoff) [15:42:57] RECOVERY - Disk space on labstore1002 is OK: DISK OK [15:43:31] 6operations, 6Security-Team: can we get rid of rsvg security patch? - https://phabricator.wikimedia.org/T104147#1535408 (10Krenair) Some historical info on T40010 and T80392 [15:44:29] (03CR) 10Dzahn: "i don't want to get into deployment system discussions. if others want to do it, fine, but i'll be" [debs/adminbot] - 10https://gerrit.wikimedia.org/r/231151 (owner: 10Merlijn van Deen) [15:44:42] main issue for dbstores are not the ports, but the source ips [15:46:41] if other teams continue ignoring me and my request for help to understand their database usage, I would say they should not be surprised about finding one day a firewall blocking their access [15:48:22] 6operations, 6Analytics-Engineering, 7Privacy: Honor DNT header for access logs & varnish logs - https://phabricator.wikimedia.org/T98831#1535432 (10Krenair) [15:49:59] (03CR) 10Dzahn: "i still wouldn't know how fabric deploy is used or how to review this script. building the package is exactly 1 command line, adding it to" [debs/adminbot] - 10https://gerrit.wikimedia.org/r/231151 (owner: 10Merlijn van Deen) [15:52:13] jynus, sorry, it seems our redis is using a different file - can you do the same (stop redis first) to the /srv/redis/maps... file [15:53:04] jynus, make sure you rename my /home/yurik/redis/dump.rdb to match the old name [15:53:33] ottomata: https://gerrit.wikimedia.org/r/#/c/229716/ [15:55:00] ja mutante sorry, was bound up with kafka stuff, am freeer today, but have to run an errand and then have som emeetings. i have to double check some things with that. [15:55:13] (03CR) 10Ottomata: "I need to double check some things. WIll look at this later today." [puppet] - 10https://gerrit.wikimedia.org/r/229716 (owner: 10Muehlenhoff) [15:55:18] ottomata: ok, thanks [15:55:56] gwicke, hi, it seems cassandra is using 60% cpu without any users. any thoughts? https://ganglia.wikimedia.org/latest/?r=hour&cs=&ce=&c=maps+Cluster+codfw&tab=m&vn=&hide-hf=false&sh=1 [15:56:04] (03PS2) 10Dzahn: elasticsearch: add cluster hosts to hiera [puppet] - 10https://gerrit.wikimedia.org/r/230955 (https://phabricator.wikimedia.org/T104962) [15:56:17] 6operations, 6Labs, 6Multimedia, 10wikitech.wikimedia.org, and 2 others: Some wikitech.wikimedia.org thumbnails broken (404) - https://phabricator.wikimedia.org/T93041#1535441 (10Krenair) I checked several other images and most of them showed the same error. [15:56:41] (03CR) 10Dzahn: [C: 032] elasticsearch: add cluster hosts to hiera [puppet] - 10https://gerrit.wikimedia.org/r/230955 (https://phabricator.wikimedia.org/T104962) (owner: 10Dzahn) [15:58:07] (03CR) 10Dzahn: [C: 031] "cluster hosts added in https://gerrit.wikimedia.org/r/#/c/230955/" [puppet] - 10https://gerrit.wikimedia.org/r/224095 (https://phabricator.wikimedia.org/T104962) (owner: 10Muehlenhoff) [15:58:53] (03CR) 10Dzahn: [C: 031] Add ferm rules for Logstash/Elasticsearch [puppet] - 10https://gerrit.wikimedia.org/r/227960 (https://phabricator.wikimedia.org/T104964) (owner: 10Muehlenhoff) [15:59:59] (03CR) 10Legoktm: "We should just undeploy WikiGrok and be done with it." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/231257 (https://phabricator.wikimedia.org/T108936) (owner: 10Florianschmidtwelzow) [16:00:04] jdlrobson kaldari codezee: Respected human, time to deploy Special deploy window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150813T1600). Please do the needful. [16:00:17] \o [16:00:22] (03CR) 10Chad: "-1. This should probably be stored in the role's hiera, not in the actual module hiera. It makes it harder to reuse things between here an" [puppet] - 10https://gerrit.wikimedia.org/r/230955 (https://phabricator.wikimedia.org/T104962) (owner: 10Dzahn) [16:00:33] mutante: Already merged, but -1 after the fact.... [16:00:34] yurik: no, I did not shutdown your instance [16:00:41] 6operations, 3Discovery-Maps-Sprint: git deploy shows 5 tilerator instances instead of 4 - https://phabricator.wikimedia.org/T108956#1535453 (10Yurik) 3NEW [16:00:52] akosiaris, it did automatically than [16:01:23] (03CR) 10Dzahn: "how is it super obvious that we use hostnames from 01 to 31? where else would the system know that from? DNS?" [puppet] - 10https://gerrit.wikimedia.org/r/230955 (https://phabricator.wikimedia.org/T104962) (owner: 10Dzahn) [16:01:28] yurik: start yours once more please [16:01:31] akosiaris, i'm not sure if jynus moved it yet or not - basically need to migrate the data from my private instance in /home/yurik/redis to the /srv/redis/... [16:02:11] akosiaris, its running - on port 7777 [16:02:18] redis-cli -p 7777 [16:02:23] ostriches: i don't follow.. "role hiera" ? [16:02:42] it's hieradata/role/common/ [16:02:44] akosiaris, from the net - simply stop the service, and copy the data file over [16:02:44] It should be in role::elasticsearch::server::something, not in elasticsearch:: [16:03:01] ostriches: the variable name?? [16:03:07] yurik: your redis is not the same as the server's redis [16:03:30] so that might not work [16:03:38] mutante: Perhaps. Maybe it's the /common/ bit that irks me more. [16:03:55] Since it's not actually common, but eqiad prod elasticsearch. [16:04:10] akosiaris, try it - if it fails, i will copy over using another method. [16:04:17] its just much messier [16:04:35] yurik: how about if it seems to work but actually messes up stuff ? [16:04:47] akosiaris, it won't don't worry [16:04:58] my role is to worry for you since you don't [16:05:02] nothing to mess up - either does, or doesn't :) [16:05:08] thanks :) [16:05:21] i worry, i just don't show it ;) [16:05:27] ostriches: there is role/eqiad/ too. exactly one other services uses it. if i would have used any other file except the existing one, i could swear people would tell me to put it all in once place [16:05:34] (03CR) 10Florianschmidtwelzow: "-> T108957" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/231257 (https://phabricator.wikimedia.org/T108936) (owner: 10Florianschmidtwelzow) [16:05:52] * ostriches shrugs [16:05:57] yurik: maybe finish whatever you were doing on that instance anyway [16:05:58] ? [16:06:05] since it's running it should be ok [16:06:20] and when it's done, move over to the actual tilerator instance ? [16:06:35] akosiaris, the moment redis went down when you installed it, it stopped doing it [16:06:45] (03PS1) 10Alex Monk: Add multimedia packages (i.e. ghostscript for pdfinfo) to silver [puppet] - 10https://gerrit.wikimedia.org/r/231293 (https://phabricator.wikimedia.org/T93041) [16:06:57] your puppet stopped the existing redis because it was on the same port (i'm guessing) [16:07:03] (03PS2) 10Alex Monk: Add multimedia packages (e.g. ghostscript for pdfinfo) to silver [puppet] - 10https://gerrit.wikimedia.org/r/231293 (https://phabricator.wikimedia.org/T93041) [16:07:12] i saw a graceful shutdown in the tmux [16:07:23] 10Ops-Access-Requests, 6operations, 5Patch-For-Review: Access to fluorine for Trey Jones (tjones) - https://phabricator.wikimedia.org/T108696#1535486 (10ArielGlenn) Adding @Krenair so we can hash out the sort of access actually needed. [16:07:48] ExecStop=/usr/bin/redis-cli shutdown [16:07:49] ahaha [16:08:07] akosiaris, and another thing - i really need the disable functionality - i think i have messed up tilerator again, so don't want it to go up and down until i fix it [16:08:12] yurik: serves you right for running a user redis on the standard port [16:08:22] apergos, do you want me to say something extra there? [16:09:20] yes, I want to have the discussion (including whatever alternatives you might suggest) on the ticket if you don' tmind [16:09:30] Krenair: [16:09:34] I'm just saying that it seems silly to grant full restricted access (i.e. mediawiki database access, ability to run maintenance scripts, and many other things) when all you really being asked for is ability to view logs on fluorine [16:09:44] bd808, does all of that go to logstash these days? [16:10:36] not disagreeing, just looking for a) alternative and b) let's have it in the ticket where we will be able to see it later [16:10:37] Krenair: many but not all logs available on fluorine are in logstash [16:10:44] (03PS1) 10Florianschmidtwelzow: Undeploy WikiGrok from wmf wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/231294 (https://phabricator.wikimedia.org/T108957) [16:11:01] (03PS2) 10Jforrester: Enable VisualEditor for 20% of new accounts on enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/227330 [16:11:01] * legoktm hugs FlorianSW [16:11:19] FlorianSW: .ßo/ [16:11:27] what's that :/ [16:11:32] legoktm: \o/ [16:12:09] let's find out which ones he needs, see if they are in logstash, and if not see whether it would make sense to add them [16:12:13] how does that sound? [16:12:22] !log sync deployed tilerator [16:12:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:12:38] Krenair: [16:12:50] great [16:13:04] the logs on fluorine are world readable, so he'd really just need shell there [16:13:07] (03PS3) 10Jforrester: Enable VisualEditor for 25% of new accounts on enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/227330 [16:13:10] yurik: I think I 've moved redis data over successfully [16:13:19] since you didn't write on the ticket I"m just going to quote you again form irc [16:13:33] yurik: I see 193 q:job and other things [16:13:36] excellente! i can see them! [16:13:58] now on to fix the service [16:14:43] that "restricted" group should be renamed "mess with anything on any wiki" [16:14:46] 10Ops-Access-Requests, 6operations, 5Patch-For-Review: Access to fluorine for Trey Jones (tjones) - https://phabricator.wikimedia.org/T108696#1535539 (10ArielGlenn) From irc: Krenair: I'm just saying that it seems silly to grant full restricted access (i.e. mediawiki database access, ability to run maintena... [16:14:51] mobrovac: any lick with mobileapps on scb ? [16:14:52] 10Ops-Access-Requests, 6operations, 5Patch-For-Review: Access to fluorine for Trey Jones (tjones) - https://phabricator.wikimedia.org/T108696#1535541 (10Krenair) I have no particular objections to this request, am just saying that maybe full 'restricted' access (ability to sudo as apache/www-data, access to... [16:15:22] akosiaris: there are trebuchet-love messages waiting for you over in #sec [16:15:47] legoktm: is there any examples for your comment on https://gerrit.wikimedia.org/r/#/c/231032/1/wmf-config/extension-list ? [16:15:57] 10Ops-Access-Requests, 6operations, 5Patch-For-Review: Access to fluorine for Trey Jones (tjones) - https://phabricator.wikimedia.org/T108696#1535570 (10ArielGlenn) So @TJones can you let us know which logs you need to see? If they aren't available in logstash we'll see whether they should be added; if not... [16:16:10] i'm not sure if you are saying put extension.json in here instead of WikidataPageBanner.php and i can't find an example to go off of [16:16:28] bd808, it does more than that. See the ticket I linked to [16:16:45] (03PS3) 10Florianschmidtwelzow: Remove unused config options [mediawiki-config] - 10https://gerrit.wikimedia.org/r/231257 (https://phabricator.wikimedia.org/T108936) [16:16:57] also, restricted access does not grant you access to the labswiki DB whereas deployment does :) [16:17:20] jdlrobson: https://gerrit.wikimedia.org/r/#/c/230700/2/wmf-config/extension-list-labs,cm [16:17:39] (03PS4) 10Florianschmidtwelzow: Remove unused config options [mediawiki-config] - 10https://gerrit.wikimedia.org/r/231257 (https://phabricator.wikimedia.org/T108936) [16:18:16] mobrovac: abort the previous git deploy please [16:18:49] akosiaris: ups, sorry, {{ done }} [16:19:03] That access ticket is going to come back with "Oliver has this access and I do the same things he does" [16:19:43] maybe so [16:20:12] things = look at logs [16:20:49] greg-g, mutante: sshing to the test.wiki server doesn't seem to be working... [16:20:51] ssh: Could not resolve hostname mw1017.wqiad.wmnet: Name or service not known [16:21:00] trying from tin [16:21:10] wqiad ? [16:21:11] (03PS2) 10Jdlrobson: Deploy WikidataPageBanner extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/231032 (https://phabricator.wikimedia.org/T98029) [16:21:11] "wqiad.wmnet"? [16:21:12] ironholds also has researchers, statistics-privatedata-users, statistics-users, statistics-admins, and analytics-privatedata-users [16:21:16] thanks legoktm hope i did it right.. [16:22:23] bd808: sorry, the docs had a typo [16:22:30] does it work now? :) [16:22:31] kaldari: problem #1 is that key forwarding doesn't work, so you can't leap from tin. Problem #2 may be the host name: ssh mw1017.eqiad.wmnet [16:22:37] you can leap from tin [16:22:46] (03CR) 10Chad: [C: 032] Undeploy WikiGrok from wmf wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/231294 (https://phabricator.wikimedia.org/T108957) (owner: 10Florianschmidtwelzow) [16:22:49] SSH_AUTH_SOCK=/run/keyholder/proxy.sock ssh mwdeploy@mw1017 [16:22:53] (03Merged) 10jenkins-bot: Undeploy WikiGrok from wmf wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/231294 (https://phabricator.wikimedia.org/T108957) (owner: 10Florianschmidtwelzow) [16:22:55] true [16:23:07] FlorianSW: Doing that now [16:23:19] ostriches: great, thanks :) [16:23:19] (03CR) 10Chad: [C: 032] Remove unused config options [mediawiki-config] - 10https://gerrit.wikimedia.org/r/231257 (https://phabricator.wikimedia.org/T108936) (owner: 10Florianschmidtwelzow) [16:23:26] (03Merged) 10jenkins-bot: Remove unused config options [mediawiki-config] - 10https://gerrit.wikimedia.org/r/231257 (https://phabricator.wikimedia.org/T108936) (owner: 10Florianschmidtwelzow) [16:23:31] I don't use this though, I just connect normally [16:23:47] akosiaris, tilerator cassandra account is not connecting [16:23:56] yurik: error ? [16:24:01] wrong psw [16:24:11] i tried manually - also doesn't work [16:24:22] i used the value that was generated into the config file [16:24:28] !log demon@tin Synchronized wmf-config/: undeploy wikigrok (duration: 00m 12s) [16:24:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:24:59] FlorianSW: All done, closed the task. [16:25:08] akosiaris, on the other hand, kartotherian account works ok [16:25:14] I will always jump at the chance to *un*deploy code! ;-) [16:25:26] (03CR) 10Merlijn van Deen: "With fabric, deployment is as simple as running" [debs/adminbot] - 10https://gerrit.wikimedia.org/r/231151 (owner: 10Merlijn van Deen) [16:25:46] ostriches: thanks for this (really really) quick deploy :) [16:25:49] yurik: indeed, lemme fix that [16:25:56] (or undeploy...) [16:25:58] PROBLEM - torrus.wikimedia.org HTTP on netmon1001 is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - string Torrus Top: Wikimedia not found on http://torrus.wikimedia.org:80/torrus - 289 bytes in 0.289 second response time [16:25:59] akosiaris, and another thing - for some reason all 4 machines have their cpu at 60 (cassandra), without any clients [16:26:20] 6operations, 10ops-eqiad: db1050 raid degraded - https://phabricator.wikimedia.org/T103110#1535640 (10Cmjohnson) I didn't notice this before but db1049/1050 are 600GB SAS drives not 300GB like the earlier batches of db servers. I am not going to be able to pull a disk from one of the recent decom'd servers.... [16:27:55] (03CR) 10Legoktm: "Minor thing, otherwise this looks good to go once the remaining blockers are fixed." (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/231032 (https://phabricator.wikimedia.org/T98029) (owner: 10Jdlrobson) [16:29:02] !log kaldari@tin Synchronized php-1.26wmf18/extensions/WikidataPageBanner: (no message) (duration: 00m 12s) [16:29:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:29:48] !log kaldari@tin Synchronized php-1.26wmf17/extensions/WikidataPageBanner: (no message) (duration: 00m 12s) [16:29:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:30:00] second try :) [16:30:05] 6operations, 10MediaWiki-extensions-TimedMediaHandler, 6Multimedia, 7HHVM: Convert tmh100[12] to HHVM and trusty - https://phabricator.wikimedia.org/T104747#1535658 (10brion) [16:30:36] (03CR) 10Dzahn: "it still has my name on it but amending/editing means i have not written the commit message or the code anymore" [puppet] - 10https://gerrit.wikimedia.org/r/223844 (https://phabricator.wikimedia.org/T104970) (owner: 10Dzahn) [16:30:49] !log kaldari@tin Synchronized php-1.26wmf17/.gitmodules: (no message) (duration: 00m 13s) [16:30:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:31:07] !log kaldari@tin Synchronized php-1.26wmf18/.gitmodules: (no message) (duration: 00m 11s) [16:31:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:31:19] (03PS3) 10Jdlrobson: Deploy WikidataPageBanner extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/231032 (https://phabricator.wikimedia.org/T98029) [16:31:38] 6operations, 10Continuous-Integration-Infrastructure, 6Multimedia, 5Patch-For-Review: Investigate impact of switching from ffmpeg to libav (ffmpeg is not in Jessie) - https://phabricator.wikimedia.org/T103335#1535660 (10brion) Ok closed out T69953 as the updated ffmpeg2theora gets things working. I think w... [16:32:20] (03CR) 10Kaldari: [C: 032] Deploy WikidataPageBanner extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/231032 (https://phabricator.wikimedia.org/T98029) (owner: 10Jdlrobson) [16:32:45] (03Merged) 10jenkins-bot: Deploy WikidataPageBanner extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/231032 (https://phabricator.wikimedia.org/T98029) (owner: 10Jdlrobson) [16:33:11] kaldari: umm, what? [16:33:23] kaldari: that bug still has open blockers [16:33:24] mobrovac, is there an easy way to reduce the number of workers in a service (in a puppet)? it seems like it auto-generates NCPU value [16:34:16] akosiaris, don't restart the service yet, or better yet - disable tilerator outright ^ otherwise it will create too many instances [16:35:43] jdlrobson: what's the status of the blockers against https://phabricator.wikimedia.org/T98029? [16:35:50] (03PS1) 10Alexandros Kosiaris: maps: Fix typo in maps.cql.erb [puppet] - 10https://gerrit.wikimedia.org/r/231296 [16:36:17] i don't see them as big blockers right now as we are not enabling wikidata banners just yet [16:36:17] twentyafterfour: can I ask you, how you get the changelog for special_extensions like Wikidata (e.g. in 1.26wmf18)? https://www.mediawiki.org/wiki/MediaWiki_1.26/wmf18/Changelog#Wikidata [16:36:20] yurik: ok, just disabled it and ^ this is the reason for the problem [16:36:25] damn typo [16:36:28] hehe ) [16:36:38] akosiaris, do i have the right to enable/disable now? [16:36:48] nope, not yet [16:37:04] not sure if the image is a blocker - haven't seen that before - there's a pending patch just waiting for a +2 from legoktm - https://gerrit.wikimedia.org/r/#/c/230643/ [16:37:21] i'm not expecting the banners to be used in mass until at earliest next week [16:37:24] so there's still time for bug fixes [16:37:32] uhh, it's waiting on me? :/ [16:37:36] (03CR) 10Alexandros Kosiaris: [C: 032] maps: Fix typo in maps.cql.erb [puppet] - 10https://gerrit.wikimedia.org/r/231296 (owner: 10Alexandros Kosiaris) [16:38:31] akosiaris, do you know how to fix the NCPU in - puppet/modules/service/templates/node/config.yaml.erb [16:38:36] we need to make this a parameter [16:39:25] FlorianSW: I'm not sure [16:39:34] :) [16:39:45] mobrovac or gwicke ^^^ [16:40:03] the make-deploy-notes script in tools/release does it [16:40:12] kaldari: i updated the wikidata task [16:40:16] jdlrobson: if you don't think they're blockers, they should be removed with appropriate justification. But I'll argue that usage tracking is a blocker after the issues we saw when Wikidata didn't have it. Anyways, I'm reviewing the patch now [16:40:32] (03PS1) 10BBlack: mobile-frontend: remove support for dead wap/mobile subdomains [puppet] - 10https://gerrit.wikimedia.org/r/231298 [16:40:34] (03PS1) 10BBlack: mobile-frontend: remove dead 666-redirect handler [puppet] - 10https://gerrit.wikimedia.org/r/231299 [16:40:42] legoktm: so do you think your bug is a blocker? if so why? [16:40:56] twentyafterfour: do you run make-deploy-notes before you push to the special branch? (if I run make-deploy-notes I don't get the wikidata changelog at all) [16:41:20] jdlrobson, legoktm: We could also just turn it on as a test for today and turn it back off for bug fixes. I don't have any horse in this race, though. I'm just deploying it as a favor :) [16:41:38] jdlrobson: because commons heavily relies on globalusage to identify whether images are used, and if not they tend to delete them. [16:41:49] yurik: no, unfortunately I have no idea what that setting is [16:41:55] mobrovac: ^ [16:42:01] legoktm: right now though the community just want to test out the banners and compare them to their existing templates [16:42:10] they are not planning to switch over just yet [16:42:17] i will assure you we will get that fixed before that happens [16:42:27] FlorianSW: I run it from the new branch after all the submodules are initialized, I do `~/src/release/make-deploy-notes/uploadChangelog.php wmf/$VERSION` [16:42:33] okay... [16:43:11] i'll take a look at testing https://gerrit.wikimedia.org/r/#/c/230643/ today [16:43:53] twentyafterfour: very strange :/ Thanks for your info nonetheless, I thought you get it from any special place :) [16:44:31] jdlrobson, legoktm: OK to proceed with deployment? [16:44:36] (03PS1) 10Alexandros Kosiaris: maps: Add ALTERs to ensure passwords [puppet] - 10https://gerrit.wikimedia.org/r/231301 [16:44:48] I guess. [16:45:56] thumbs up from me [16:46:00] i've recorded this on the task [16:46:33] jdlrobson: so you don't need wgWPBBannerProperty set for enwikivoyage? [16:46:46] nope because we are not launching wikidata integration just yet [16:47:06] cool, just checking :) [16:47:25] hey Nicolas_Raoul :) [16:47:38] Hi :-) Watching the historic moment [16:47:59] 6operations, 10Beta-Cluster, 10Traffic, 5Patch-For-Review: Upgrade beta-cluster caches to jessie - https://phabricator.wikimedia.org/T98758#1535781 (10demon) [16:48:23] yurik: akosiaris: what about ncpu? [16:48:34] (sorry, wsa in a meeting) [16:48:52] 6operations, 10Beta-Cluster, 10Traffic, 5Patch-For-Review: Upgrade beta-cluster caches to jessie - https://phabricator.wikimedia.org/T98758#1276698 (10demon) No complaint from browser testing folks, assuming that's fine. Decom'd the old instances now. I guess technically we need to check purges are workin... [16:49:48] Nicolas_Raoul: first extension to launch on Wikivoyage first :) [16:50:01] !log kaldari@tin Synchronized wmf-config/extension-list: syncing extension-list for WikidataPageBanner (duration: 00m 11s) [16:50:08] mobrovac: I think yurik has some question about it being configurable [16:50:56] !log kaldari@tin Synchronized wmf-config/InitialiseSettings.php: syncing InitialiseSettings for WikidataPageBanner (duration: 00m 12s) [16:51:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:51:10] !log kaldari@tin Synchronized wmf-config/CommonSettings.php: syncing CommonSettings for WikidataPageBanner (duration: 00m 12s) [16:51:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:52:03] jdlrobson: loooks like it's borked :( [16:52:10] https://en.wikivoyage.org/wiki/Main_Page [16:52:12] kaldari: what happened? [16:52:23] errr i thought you were deploying to test first? [16:52:33] on sec [16:52:36] ...revert? [16:52:38] mutante: Oh, that is only jmxtrans-jmx? [16:52:42] mobrovac: I also see no problem in scb100{1,2} git sha hashes. they got the same as tin [16:52:44] [8c7c9fc4] 2015-08-13 16:52:30: Fatal exception of type "MWException" [16:52:53] (03PS1) 10Jcrespo: Update comment about database disk size for db1050 and db1049 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/231303 [16:52:58] akosiaris: lemme check [16:53:03] no spark servcies, sorry [16:53:05] that is an easy peasy one [16:53:09] (03PS3) 10Ottomata: Add ferm rules for spark/jmxtrans [puppet] - 10https://gerrit.wikimedia.org/r/229716 (owner: 10Muehlenhoff) [16:53:14] ottomata: yes, that one is only 2101 [16:53:17] (03CR) 10Ottomata: [C: 032] Add ferm rules for spark/jmxtrans [puppet] - 10https://gerrit.wikimedia.org/r/229716 (owner: 10Muehlenhoff) [16:53:22] kaldari: you need to scap. [16:53:23] 6operations, 10Wikimedia-Site-Requests, 5Patch-For-Review: Rename "chapcomwiki" to "affcomwiki" - https://phabricator.wikimedia.org/T41482#1535849 (10ArielGlenn) it's not yet in wikiversions.json so I guess the answer to the previous question is "no". [16:53:28] ottomata: i could confirm it's a java port, but it's not like i actually connected [16:53:29] Error: invalid magic word 'PAGEBANNER' [16:53:42] ottomata: :) [16:53:44] ja that one is fine [16:53:47] thx [16:54:00] kaldari: revert to unbreak the site, add to extension-list first, scap, then actually enable in InitialiseSettings.php [16:54:09] actually, i almost thing we should put that in the jmxtrans module somehow, as its own class, and maybe auto include it if ferm is defined in some clever way [16:54:26] !log kaldari@tin Synchronized wmf-config/InitialiseSettings.php: syncing InitialiseSettings for WikidataPageBanner off (duration: 00m 11s) [16:54:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:54:36] ok, wikivoyage is back again [16:54:42] that same ferm rule is on any host that includes jmxtrans [16:54:46] in exactly the same way [16:54:53] yurik: do you need to set ncpu explicitly, or is it okay to get workers? [16:54:59] legoktm: OK, didn't know I needed to scap [16:55:27] scap is needed for any localisation changes, and I forgot that magic words need it too :/ [16:55:47] gwicke: yurik: it is set to npu now, i'm guessing he needs a number there, i [16:55:52] i'll fix that up [16:55:55] !log kaldari@tin Started scap: (no message) [16:56:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:56:21] mobrovac: kk, thx! [16:56:25] ottomata: that sounds reasonable, but it would have to be a role/jmxtrans then [16:56:53] gwicke, i need a divider of ncpu [16:56:57] mobrovac, ^ [16:57:02] e.g. NCPU/2 [16:57:19] akosiaris: have you restarted the service? I can see the code delivering the spec, but curl still gives 404s [16:57:31] yurik: ah, it's the ticket you created [16:57:57] so as you may assume, the functionality is not there for that [16:58:10] yurik: so can't put ncou/2 in the prod config [16:58:14] now [16:58:21] Wikivoyage seems perfectly normal, I tried purging and loading a few pages [16:58:44] yurik: i can change the prod config to allow you to specify a number, but no expressions (for now) [16:58:47] mobrovac, correct - ncpu is too high for us [16:59:16] mobrovac, that would work too - as long as i can control via puppet the number for each instance [16:59:17] well, the jmxtrans config is always service specific [16:59:26] so hard to make a role for it [17:00:36] yurik: kk, will adapt the puppet module for that [17:00:37] ottomata: if it's always service specific it sounds the current solution isn't so bad after all [17:00:53] yeah its fine [17:01:02] mobrovac: ah, fixed... kind of [17:01:05] now /{domain}/v1/page/mobile-html/{title} is CRITICAL: Test retrieve en.wp main page via mobile-html returned the unexpected status 503 (expecting: 200)  [17:01:26] damn [17:02:16] 6operations, 3Discovery-Maps-Sprint: git deploy shows 5 tilerator instances instead of 4 - https://phabricator.wikimedia.org/T108956#1535893 (10ArielGlenn) the syncs are to maps-test2001 through 4, plus tin. tin is the one that now no longer gets the deployment. is it possible that tin had the grain on there... [17:02:18] akosiaris, in that case, pls keep the tilerator off for now, i will run it in manual until we have workers-count configurable per instance [17:02:24] mobrovac, thx! [17:02:56] mutante: merged. [17:03:19] 6operations, 10Deployment-Systems, 10RESTBase, 6Services, 5Patch-For-Review: [Discussion] Move restbase config to Ansible (or $deploy_system in general)? - https://phabricator.wikimedia.org/T107532#1535899 (10GWicke) > Essentially ansible requires full root sudo capability on the target machine. That de... [17:03:19] ottomata: cool, thx [17:05:11] akosiaris: auf, my bad, forgot to add restbase's port to the config, fixing now [17:07:23] (03PS1) 10Mobrovac: MobileApps: Add the port to the restbase_uri config [puppet] - 10https://gerrit.wikimedia.org/r/231308 [17:07:28] akosiaris: ^^ [17:11:38] PROBLEM - etherpad.wikimedia.org HTTP on etherpad1001 is CRITICAL - Socket timeout after 10 seconds [17:15:26] RECOVERY - etherpad.wikimedia.org HTTP on etherpad1001 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 522 bytes in 0.007 second response time [17:19:25] yurik: is that for tilerator? [17:19:33] tiler8r [17:19:36] mobrovac, yep [17:19:38] hehe [17:19:52] yurik: how many to put? so i can include it in the patch right away [17:20:17] mobrovac, that's the point - we can't have identical - we have two differnt types of machines [17:20:32] for 2001 & 2003 - 4, for the 2002 & 2004 - 6 [17:20:47] has to be a var in the tilerator puppet [17:20:51] you are a complicated man to satisfy [17:21:20] mobrovac, i work with the complicated machines i got - you work with complication derivatve of that ;) [17:21:26] haha [17:21:58] 10Ops-Access-Requests, 6operations, 5Patch-For-Review: Access to fluorine for Trey Jones (tjones) - https://phabricator.wikimedia.org/T108696#1536018 (10EBernhardson) The one that came up useful most recently was the api.log. Trey found some interesting patterns while analyzing our zero results data but did... [17:23:13] (03CR) 10Ori.livneh: [C: 031] diamond: add upstart/systemd service stats [puppet] - 10https://gerrit.wikimedia.org/r/224093 (https://phabricator.wikimedia.org/T108027) (owner: 10Filippo Giunchedi) [17:25:58] (03PS5) 10BryanDavis: kibana: make compatible with Apache 2.4 [puppet] - 10https://gerrit.wikimedia.org/r/230692 (owner: 10Dzahn) [17:27:14] (03PS3) 10Ottomata: Change percentage in EventLogging validation alert [puppet] - 10https://gerrit.wikimedia.org/r/230825 (https://phabricator.wikimedia.org/T108339) (owner: 10Mforns) [17:27:43] (03CR) 10BryanDavis: "Added templates/kibana/apache-auth-none.erb and some "Order Allow,Deny" directives." [puppet] - 10https://gerrit.wikimedia.org/r/230692 (owner: 10Dzahn) [17:28:15] 10Ops-Access-Requests, 6operations, 5Patch-For-Review: Access to fluorine for Trey Jones (tjones) - https://phabricator.wikimedia.org/T108696#1536057 (10TJones) @ArielGlenn, sorry I haven't been more forthcoming. I've been trying to figure out the answer myself. I'm relatively new to Wikimedia. I'm on the D... [17:28:26] (03CR) 10Ottomata: [C: 032] Change percentage in EventLogging validation alert [puppet] - 10https://gerrit.wikimedia.org/r/230825 (https://phabricator.wikimedia.org/T108339) (owner: 10Mforns) [17:34:09] Krenair: T71548 seems easy. [17:34:28] yep [17:34:30] Er, legoktm too [17:34:58] rm -rf AFTv5 plz [17:35:04] (03PS2) 10Alexandros Kosiaris: MobileApps: Add the port to the restbase_uri config [puppet] - 10https://gerrit.wikimedia.org/r/231308 (owner: 10Mobrovac) [17:35:14] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] MobileApps: Add the port to the restbase_uri config [puppet] - 10https://gerrit.wikimedia.org/r/231308 (owner: 10Mobrovac) [17:35:36] just runBatchedQuery.php I guess [17:36:31] Didn't really need to, they all were small enough and completed in well <1s [17:36:35] yurik: ok leaving tilerator disabled then [17:36:46] akosiaris, thx [17:36:49] i will run them by hand [17:37:09] akosiaris, i updated that sudo grant task with a list of all the stuff we needed [17:37:25] akosiaris, https://phabricator.wikimedia.org/T106637 [17:37:53] 6operations, 7Graphite: grafana access control - https://phabricator.wikimedia.org/T108546#1536095 (10ArielGlenn) I talked with ori about this, after checking the puppet manifests. I could not see either where grafana passes creds nor where it is lmited in the data it gets. So here's the story: grafana uses... [17:38:20] yurik: ah, nice [17:38:21] thanks [17:38:47] ostriches: that was fast :D [17:40:36] 6operations, 10Deployment-Systems, 10RESTBase, 6Services, 5Patch-For-Review: [Discussion] Move restbase config to Ansible (or $deploy_system in general)? - https://phabricator.wikimedia.org/T107532#1536097 (10thcipriani) >>! In T107532#1535899, @GWicke wrote: >> Essentially ansible requires full root sud... [17:41:07] (03PS1) 10Ori.livneh: xenon: small tweaks to code clarity and variable names [puppet] - 10https://gerrit.wikimedia.org/r/231315 [17:41:41] (03CR) 10Ori.livneh: [C: 032 V: 032] xenon: small tweaks to code clarity and variable names [puppet] - 10https://gerrit.wikimedia.org/r/231315 (owner: 10Ori.livneh) [17:41:45] 10Ops-Access-Requests, 6operations, 5Patch-For-Review: Access to fluorine for Trey Jones (tjones) - https://phabricator.wikimedia.org/T108696#1536112 (10ArielGlenn) I'll find out about the api logs at least and see what's going on. Thanks! [17:44:12] mobrovac: change merged, service restarted, no chage in icinga checks [17:44:16] change* [17:44:37] 6operations, 6Discovery, 10Maps, 6Services, and 2 others: Puppetize Tilerator for deployment - https://phabricator.wikimedia.org/T105074#1536121 (10Yurik) [17:44:39] 7Blocked-on-Operations, 6operations, 6Discovery, 10Maps, and 2 others: Add Redis to maps cluster - https://phabricator.wikimedia.org/T107813#1536117 (10Yurik) 5Open>3Resolved a:3Yurik thanks! As a separate task we might want to consider a second backup instance, but imho not a big deal for now. [17:44:42] damn (yet again) [17:46:01] !log kaldari@tin Finished scap: (no message) (duration: 50m 05s) [17:46:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:46:16] akosiaris: not quite the same, we get 403s from url-downloader now for restbase.svc.eqiad.wmnet:7231 [17:46:32] url-downloader ? [17:46:43] why is url-downloader used to access an internal IP ? [17:46:57] (03PS1) 10Ori.livneh: xenon: fix typo introduced in I1c228ea75 [puppet] - 10https://gerrit.wikimedia.org/r/231318 [17:47:10] in fact, an LVS IP [17:47:11] (03CR) 10Ori.livneh: [C: 032 V: 032] xenon: fix typo introduced in I1c228ea75 [puppet] - 10https://gerrit.wikimedia.org/r/231318 (owner: 10Ori.livneh) [17:47:23] !log kaldari@tin Synchronized wmf-config/InitialiseSettings.php: syncing InitialiseSettings for WikidataPageBanner (duration: 00m 12s) [17:47:32] internal LVS IP to be more exact [17:47:39] there I needed 3 tries to get that right [17:49:05] 6operations, 3Discovery-Maps-Sprint: git deploy shows 5 tilerator instances instead of 4 - https://phabricator.wikimedia.org/T108956#1536135 (10akosiaris) not really. I 've never added the role to it [17:49:13] akosiaris: well, because we haven't gotten around fixing that "please, stop using url-downloader for internal domains" bug :/ [17:49:49] ahah, this one https://gerrit.wikimedia.org/r/#/c/207490/1 [17:50:06] well, the info on how to fix this is in that patch [17:50:26] more precisely is in network::constants [17:51:11] (03PS3) 10Dduvall: contint: Install chromedriver for running MW-Selenium tests [puppet] - 10https://gerrit.wikimedia.org/r/223691 (https://phabricator.wikimedia.org/T103039) [17:51:41] (03PS1) 10Mobrovac: service::node: Make the number of workers to start configurable [puppet] - 10https://gerrit.wikimedia.org/r/231319 (https://phabricator.wikimedia.org/T108888) [17:52:42] (03CR) 10Ori.livneh: [C: 032 V: 032] Increase the log level of error logs to LOG_NOTICE [debs/nutcracker] - 10https://gerrit.wikimedia.org/r/228876 (owner: 10Ori.livneh) [17:53:07] yurik: akosiaris: https://gerrit.wikimedia.org/r/#/c/231319/ for changing ncpu [17:53:27] ACKNOWLEDGEMENT - mobileapps endpoints health on scb1002 is CRITICAL: /{domain}/v1/page/mobile-html/{title} is CRITICAL: Test retrieve en.wp main page via mobile-html returned the unexpected status 403 (expecting: 200) alexandros kosiaris known, code wrongly uses url-downloader [17:53:30] 6operations, 3Discovery-Maps-Sprint: git deploy shows 5 tilerator instances instead of 4 - https://phabricator.wikimedia.org/T108956#1536144 (10ArielGlenn) well at some point tin made it into the redis db and it's not a target for the sync. it could just be removed. root@palladium:/usr/share/pyshared/salt# s... [17:53:51] ACKNOWLEDGEMENT - mobileapps endpoints health on scb1001 is CRITICAL: /{domain}/v1/page/mobile-html/{title} is CRITICAL: Test retrieve en.wp main page via mobile-html returned the unexpected status 403 (expecting: 200) alexandros kosiaris known, code wrongly uses url-downloader [17:54:12] yurik: hopefully by early next week i'll be able to get you the ncpu/2 syntax too [17:55:04] (03PS2) 10Dzahn: fermium: add rsync server to sync from sodium [puppet] - 10https://gerrit.wikimedia.org/r/231190 [17:56:32] (03CR) 10Dzahn: [C: 031] service::node: Make the number of workers to start configurable [puppet] - 10https://gerrit.wikimedia.org/r/231319 (https://phabricator.wikimedia.org/T108888) (owner: 10Mobrovac) [17:56:51] (03CR) 10Dzahn: [C: 032] fermium: add rsync server to sync from sodium [puppet] - 10https://gerrit.wikimedia.org/r/231190 (owner: 10Dzahn) [17:57:28] PROBLEM - puppet last run on cp3039 is CRITICAL puppet fail [18:00:04] twentyafterfour greg-g: Respected human, time to deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150813T1800). Please do the needful. [18:02:24] chasemp, godog: Could we merge https://gerrit.wikimedia.org/r/#/c/230922/ today? It's working well in beta cluster. [18:02:44] (03PS5) 10Ori.livneh: logstash: normalize "level" fields across log types [puppet] - 10https://gerrit.wikimedia.org/r/230922 (owner: 10BryanDavis) [18:02:54] (03CR) 10Ori.livneh: [C: 032] logstash: normalize "level" fields across log types [puppet] - 10https://gerrit.wikimedia.org/r/230922 (owner: 10BryanDavis) [18:03:02] thanks ori [18:04:28] I <3 jouncebot [18:05:59] 6operations, 10Deployment-Systems, 10RESTBase, 6Services, 5Patch-For-Review: [Discussion] Move restbase config to Ansible (or $deploy_system in general)? - https://phabricator.wikimedia.org/T107532#1536207 (10GWicke) @thcipriani: I see, that is indeed a bit annoying. Perhaps not the end of the world for... [18:06:23] mobrovac, you want to do it as a multiplier too? e.g. ncpu*1.3 :) [18:06:48] yup [18:07:03] yurik: i don't do half-assed code ;) [18:07:04] :P [18:07:15] not my style [18:09:45] (03CR) 10Yurik: [C: 031] service::node: Make the number of workers to start configurable [puppet] - 10https://gerrit.wikimedia.org/r/231319 (https://phabricator.wikimedia.org/T108888) (owner: 10Mobrovac) [18:17:06] ok I'm gonna deploy the train unless there are any objections [18:17:14] mobrovac, that's a very bold statement :-P [18:17:31] so is this [18:17:42] lolol [18:17:45] lol twentyafterfour [18:17:50] akosiaris, want to +2 minor https://gerrit.wikimedia.org/r/#/c/231319/ [18:18:30] mobrovac, thx for getting it done so quickly! now on to actually using it... where would i put it? :) [18:19:29] yurik: i didn't include it since you want one setting for xx1 and xx3, and another for xx2 and xx4, but they have different roles applied to them (master/slave), unfortunately [18:19:44] OutputPage::getModuleStyles: style module should define its position explicitly: ext.pygments ResourceLoad [18:19:46] erFileModule [Called from OutputPage::getModuleStyles in /srv/mediawiki/php-1.26wmf18/includes/OutputPage.php at l [18:19:48] ine 624] in /srv/mediawiki/php-1.26wmf18/includes/debug/MWDebug.php on line 300 [18:19:59] yurik: so, until that expression bug is done, i suggest going with the lower of the two numbers everywhere [18:20:33] 6operations, 6Services, 3Discovery-Maps-Sprint: Configure maps cluster's tilerator to the specific number of workers - https://phabricator.wikimedia.org/T108974#1536271 (10Yurik) 3NEW [18:20:46] (03PS1) 10RobH: grant services team full icinga permissions [puppet] - 10https://gerrit.wikimedia.org/r/231327 [18:20:46] yurik: i can amend the PS with that if you want, but for the numbers you want, akosiaris is competent for that [18:20:46] that looks kinda nasty, it's got 3756 instances of that error message in wmf18 currently, so maybe I shouldn't deploy the train? That's gonna spam the logs pretty bad if wmf18 goes out to all wikis [18:20:58] gwicke: ^ merging your teams icinga stuff [18:21:41] robh: all alerts be gone ;) [18:21:45] thanks! [18:21:46] yuhuu [18:21:48] if my git review decides to work that is [18:22:25] (03PS2) 10RobH: grant services team full icinga permissions [puppet] - 10https://gerrit.wikimedia.org/r/231327 [18:22:34] (03PS1) 10Andrew Bogott: Remove network_host setting. [puppet] - 10https://gerrit.wikimedia.org/r/231328 [18:22:52] (03CR) 10RobH: [C: 032] grant services team full icinga permissions [puppet] - 10https://gerrit.wikimedia.org/r/231327 (owner: 10RobH) [18:24:00] forcing neon to puppet run now and babysitting it [18:24:33] twentyafterfour, isn't that https://gerrit.wikimedia.org/r/#/c/230280/1/extension.json ? [18:24:49] which was deployed to wmf18 already? [18:24:53] RECOVERY - puppet last run on cp3039 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [18:25:06] oh, right, the error is from 1.26wmf18 in prod, ok [18:25:20] ori, ^ [18:25:24] Krenair: yeah problem is it's alerting [18:25:50] what's going on? [18:26:09] ok, i saw the backlog [18:26:11] let me fix it [18:26:19] Ah, was about to yell about that. [18:27:16] (03PS2) 10Andrew Bogott: Remove network_host setting. [puppet] - 10https://gerrit.wikimedia.org/r/231328 [18:27:17] oh [18:27:19] it's a warning [18:27:22] but yeah, i'll fix it [18:27:25] gwicke: Ok, you guys should be able to login and do stuff in icinga now, its live. [18:27:32] it's a warning but it's spamming logs [18:27:44] and it'll spam much worse if I deploy the train now [18:27:49] 10Ops-Access-Requests, 6operations, 6Services, 7Icinga, 7Monitoring: give services team permissions to send commands in icinga - https://phabricator.wikimedia.org/T105228#1536298 (10RobH) 5Open>3Resolved Pushed live, should work fine. [18:28:12] fair, i'll have a fix in a sec [18:28:30] ok thanks ori, [18:28:41] (03CR) 10Andrew Bogott: [C: 032] Remove network_host setting. [puppet] - 10https://gerrit.wikimedia.org/r/231328 (owner: 10Andrew Bogott) [18:28:51] 6operations, 6Discovery, 10Maps, 6Services, and 2 others: Puppetize Tilerator for deployment - https://phabricator.wikimedia.org/T105074#1536303 (10Yurik) verified that tilerator shows up in both grafana & logstash. Waiting on the T108888 (not-bloked as it has much bigger scope) to set number of threads b... [18:29:58] (03CR) 10Dereckson: "I've asked bug requester to confirm the name, they indicated to be native speaker at https://phabricator.wikimedia.org/T104132#1535659" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/231074 (https://phabricator.wikimedia.org/T104132) (owner: 10Dereckson) [18:30:41] (03CR) 10Dereckson: "I've asked bug requester to confirm the name, they indicated to be native speaker at https://phabricator.wikimedia.org/T104132#1535659" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/228477 (https://phabricator.wikimedia.org/T104132) (owner: 10Dereckson) [18:31:16] 6operations, 6Services, 3Discovery-Maps-Sprint: Configure maps cluster's tilerator to the specific number of workers - https://phabricator.wikimedia.org/T108974#1536315 (10Yurik) [18:31:19] 6operations, 6Discovery, 10Maps, 6Services, and 2 others: Puppetize Tilerator for deployment - https://phabricator.wikimedia.org/T105074#1536314 (10Yurik) [18:32:26] !log ori@tin Synchronized php-1.26wmf18/extensions/SyntaxHighlight_GeSHi/extension.json: If0851400: Fix-up for I2de8a400d: explicitly declare module position (duration: 00m 12s) [18:32:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:32:49] twentyafterfour: should be better [18:33:05] ostriches: master patch still needs +2 if you have a sec -- https://gerrit.wikimedia.org/r/#/c/231329/ [18:34:32] ori: {{done}} [18:34:40] thanks [18:34:49] yw [18:34:53] (03PS1) 10Dzahn: fermium: ferm rule to allow rsync from sodium [puppet] - 10https://gerrit.wikimedia.org/r/231333 (https://phabricator.wikimedia.org/T108073) [18:35:00] (03CR) 10jenkins-bot: [V: 04-1] fermium: ferm rule to allow rsync from sodium [puppet] - 10https://gerrit.wikimedia.org/r/231333 (https://phabricator.wikimedia.org/T108073) (owner: 10Dzahn) [18:35:04] twentyafterfour: I think I have a patch up too for the undefined index warning from Cirrus [18:36:13] (03PS2) 10Dzahn: fermium: ferm rule to allow rsync from sodium [puppet] - 10https://gerrit.wikimedia.org/r/231333 (https://phabricator.wikimedia.org/T108073) [18:36:19] (03CR) 10jenkins-bot: [V: 04-1] fermium: ferm rule to allow rsync from sodium [puppet] - 10https://gerrit.wikimedia.org/r/231333 (https://phabricator.wikimedia.org/T108073) (owner: 10Dzahn) [18:36:33] Those SpecialMathShowImage ones are ugly too [18:36:34] * ostriches files bug [18:38:10] (03CR) 10Dzahn: [C: 031] "on db1043, checked and there is 3306 and 3307 open here, that looks good, except i also see an rsyslogd running but no obvious puppet role" [puppet] - 10https://gerrit.wikimedia.org/r/228782 (https://phabricator.wikimedia.org/T104699) (owner: 10Muehlenhoff) [18:40:49] (03PS3) 10Dzahn: fermium: ferm rule to allow rsync from sodium [puppet] - 10https://gerrit.wikimedia.org/r/231333 (https://phabricator.wikimedia.org/T108073) [18:46:29] hi [18:46:41] was the train completed or is something pending? [18:48:10] hi aharoni [18:48:21] twentyafterfour: There's already a fix in master for the "Call to a member function getIsPreview() on a non-object (NULL)" error. [18:48:25] If we want to backport it [18:49:20] ostriches: ok [18:49:30] aharoni: pending [18:55:10] 6operations, 10ops-eqiad: Remove and decommission wmf3098, 3100-3108 rack A3 to make room for hadoop nodes - https://phabricator.wikimedia.org/T108979#1536479 (10Cmjohnson) 3NEW [18:58:50] 6operations, 10ops-eqiad: Remove and decommission wmf3098, 3100-3108 rack A3 to make room for hadoop nodes - https://phabricator.wikimedia.org/T108979#1536515 (10RobH) Agreed, please fully decommission and remove all info in dns for these, and zero out the disks/bios/mgmt. Please also update the spares page t... [18:59:20] 6operations, 10ops-eqiad: Remove and decommission wmf3098, 3100-3108 rack A3 to make room for hadoop nodes - https://phabricator.wikimedia.org/T108979#1536518 (10RobH) Chris, Mark and I mentioned this in passing earlier this week, so it isnt unexpected. We'll end up noting these on our accounting of hardware. [19:00:54] 6operations, 10ops-eqiad: Remove and decommission wmf3098, 3100-3108 rack A3 to make room for hadoop nodes - https://phabricator.wikimedia.org/T108979#1536526 (10RobH) I'll note we'll need @mark to approve full decommission, so please don't put in the decom racks in racktables. For now just no rack assignment... [19:01:59] twentyafterfour: Working on a backport now for it [19:04:06] (03CR) 10Dzahn: [C: 032] fermium: ferm rule to allow rsync from sodium [puppet] - 10https://gerrit.wikimedia.org/r/231333 (https://phabricator.wikimedia.org/T108073) (owner: 10Dzahn) [19:04:48] 6operations, 10RESTBase, 10RESTBase-Cassandra: Test multiple Cassandra instances per hardware node - https://phabricator.wikimedia.org/T95253#1536539 (10Eevans) >>! In T95253#1535221, @fgiunchedi wrote: > also multiple instances means we'll need to adapt `cassandra-metrics-collector` to discover and work wit... [19:05:42] * aharoni wonders why does he feel like he's the only one who is eager to test software deployed in production as soon as possible after the train. [19:06:01] (03PS2) 10Dzahn: wmnet: fix indentations for readability [dns] - 10https://gerrit.wikimedia.org/r/231020 [19:06:14] aharoni: you aren't the only one, you're just the one kept up the latest :) [19:06:20] eh, this is new: [19:06:22] Could not find dependent Exec[compile fragments] [19:06:32] File[/frag-lists] at /etc/puppet/modules/rsync/manifests/server/module.pp:55 [19:06:35] hrmmm [19:09:51] PROBLEM - puppet last run on fermium is CRITICAL puppet fail [19:10:10] PROBLEM - check_payments_wiki on payments1004 is CRITICAL: HTTP CRITICAL: HTTP/1.1 403 Forbidden - string OK not found on https://127.0.0.1:443https://payments.wikimedia.org/index.php/Special:SystemStatus - 418 bytes in 0.012 second response time [19:10:16] uhhhh [19:10:32] (03PS1) 10Dzahn: fermium: re-include rsync server [puppet] - 10https://gerrit.wikimedia.org/r/231390 [19:10:34] fermium is me and nothing to worry [19:10:37] payments is unrelated [19:11:39] but we had the same yesterday and Jeff said .. eh... [19:11:50] < Jeff_Green> ^^ on the payments1004 situation, it has to do with modsecurity testing [19:12:34] (03CR) 10Dzahn: [C: 032] fermium: re-include rsync server [puppet] - 10https://gerrit.wikimedia.org/r/231390 (owner: 10Dzahn) [19:13:41] RECOVERY - puppet last run on fermium is OK Puppet is currently enabled, last run 24 seconds ago with 0 failures [19:14:24] twentyafterfour: It's https://gerrit.wikimedia.org/r/#/c/231391/ when you're ready for it [19:15:10] PROBLEM - check_payments_wiki on payments1004 is CRITICAL: HTTP CRITICAL: HTTP/1.1 403 Forbidden - string OK not found on https://127.0.0.1:443https://payments.wikimedia.org/index.php/Special:SystemStatus - 418 bytes in 0.012 second response time [19:16:10] 6operations: Memcached TIMEOUT error spam from memcached log for global:slave_lag keys - https://phabricator.wikimedia.org/T108982#1536590 (10aaron) 3NEW [19:17:07] ostriches: thanks! [19:17:55] ostriches: uhm, it says zero changes in that commit [19:18:14] oh it's just the submodule bump [19:18:45] Yerp [19:18:51] I did the rest of it [19:19:08] Also, just filed a bug for that divide-by-zero bug in PagedTiffHandler [19:20:10] PROBLEM - check_payments_wiki on payments1004 is CRITICAL: HTTP CRITICAL: HTTP/1.1 403 Forbidden - string OK not found on https://127.0.0.1:443https://payments.wikimedia.org/index.php/Special:SystemStatus - 418 bytes in 0.012 second response time [19:22:49] (03Abandoned) 1020after4: Rotate apache2 logs more often on phabricator web host [puppet] - 10https://gerrit.wikimedia.org/r/230382 (https://phabricator.wikimedia.org/T108503) (owner: 1020after4) [19:24:07] blargh payments1004. looking [19:25:10] PROBLEM - check_payments_wiki on payments1004 is CRITICAL: HTTP CRITICAL: HTTP/1.1 403 Forbidden - string OK not found on https://127.0.0.1:443https://payments.wikimedia.org/index.php/Special:SystemStatus - 418 bytes in 0.012 second response time [19:26:47] the amusing thing is that it's because modsecurity works, it's blocking the nagios check because the request lacks an Accept header [19:27:10] (03PS1) 10Dzahn: fermium: adjust rsync import dir, ensure package [puppet] - 10https://gerrit.wikimedia.org/r/231394 (https://phabricator.wikimedia.org/T108073) [19:28:25] Jeff_Green: check_http has options to send header, maybe this: [19:28:28] -k, --header=STRING Any other tags to be sent in http header. Use multiple times for additional headers [19:29:23] mutante: cool, that might come in handy. I need to figure out how to write exceptions in the mod-sec paradigm because it blocks other stuff as well [19:30:11] RECOVERY - check_payments_wiki on payments1004 is OK: HTTP OK: HTTP/1.1 200 OK - 249 bytes in 0.033 second response time [19:31:22] !log demon@tin Synchronized php-1.26wmf18/extensions/Graph: (no message) (duration: 00m 11s) [19:31:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:31:32] twentyafterfour: fix for that Graph bug live ^ [19:34:57] 6operations, 7Availability: Create MediaWiki session monitoring - https://phabricator.wikimedia.org/T108985#1536692 (10aaron) 3NEW [19:35:39] !log demon@tin Synchronized php-1.26wmf18/extensions/Graph: (no message) (duration: 00m 10s) [19:35:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:37:57] ostriches: ok. [19:38:16] !log demon@tin Synchronized php-1.26wmf18/extensions/CirrusSearch/: (no message) (duration: 00m 11s) [19:38:17] I wmf/1.26wmf18 looks good, ready to deploy the train now [19:38:20] Also that one ^ [19:38:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:38:33] I'm out of your way [19:38:37] 6operations, 7Availability: Monitor MediaWiki sessions - https://phabricator.wikimedia.org/T108985#1536736 (10ori) p:5Triage>3High [19:40:07] (03PS1) 1020after4: all wikis to 1.26wmf18 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/231397 [19:40:24] (03CR) 1020after4: [C: 032] all wikis to 1.26wmf18 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/231397 (owner: 1020after4) [19:40:30] (03Merged) 10jenkins-bot: all wikis to 1.26wmf18 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/231397 (owner: 1020after4) [19:56:56] !log twentyafterfour@tin rebuilt wikiversions.cdb and synchronized wikiversions files: all wikis to 1.26wmf18 [19:57:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:58:02] wee, deployed. [19:58:20] Special:Version in he.wikipedia shows libraries description in English from left to right rather than from right to left. [19:58:39] a rather unimportant feature, but very nice to see. [19:59:44] (03PS1) 10Ori.livneh: xenon: fix for xenon-generate-svgs [puppet] - 10https://gerrit.wikimedia.org/r/231400 [19:59:46] aharoni: :) [19:59:56] (03PS2) 10Ori.livneh: xenon: fix for xenon-generate-svgs [puppet] - 10https://gerrit.wikimedia.org/r/231400 [20:00:06] (03CR) 10Ori.livneh: [C: 032 V: 032] xenon: fix for xenon-generate-svgs [puppet] - 10https://gerrit.wikimedia.org/r/231400 (owner: 10Ori.livneh) [20:00:11] and VE works on my phone \o/ James_F [20:00:49] cool [20:03:32] 6operations: dataset1001/dumps rsync setup should use rsync::server from module - https://phabricator.wikimedia.org/T108992#1536898 (10Dzahn) 3NEW [20:04:03] 6operations: dataset1001/dumps rsync setup should use rsync::server from module - https://phabricator.wikimedia.org/T108992#1536909 (10Dzahn) a:3ArielGlenn [20:05:18] in phabricator, what does a small red dot in front of a username mean? [20:05:27] communist [20:05:30] haha [20:05:56] they are absent in teh calendar [20:05:59] 6operations, 6Services, 3Discovery-Maps-Sprint: Configure maps cluster's tilerator to the specific number of workers - https://phabricator.wikimedia.org/T108974#1536921 (10mobrovac) > For now - lets hardcode it to the values 4 for the 8 core machines, and 6 for the 12 core This creates a Puppet mess, since... [20:06:02] on vacation or unavailable or whatever [20:06:06] also communists [20:06:09] Yup. [20:06:16] * James_F sings The Red Flag to himself. [20:06:34] communists on vacation, i see [20:06:38] (03PS1) 10Andrew Bogott: Replace some hardcoded labnet1001 refs with hiera values. [puppet] - 10https://gerrit.wikimedia.org/r/231406 (https://phabricator.wikimedia.org/T99701) [20:06:40] thanks, would not have guessed [20:07:00] i was thinking "doesnt have permission to read this" or something :p [20:07:09] da, comrad [20:07:31] (03CR) 10jenkins-bot: [V: 04-1] Replace some hardcoded labnet1001 refs with hiera values. [puppet] - 10https://gerrit.wikimedia.org/r/231406 (https://phabricator.wikimedia.org/T99701) (owner: 10Andrew Bogott) [20:08:14] "leisurists, unite!" [20:09:01] mutante: a hovertext would be nice, yeah [20:09:03] (03PS2) 10Andrew Bogott: Replace some hardcoded labnet1001 refs with hiera values. [puppet] - 10https://gerrit.wikimedia.org/r/231406 (https://phabricator.wikimedia.org/T99701) [20:09:05] 6operations, 6Services, 3Discovery-Maps-Sprint: Configure maps cluster's tilerator to the specific number of workers - https://phabricator.wikimedia.org/T108974#1536945 (10BBlack) Puppet has access to the core counts (both logical/hyperthread counts and true physical core counts) to use as a mathematical bas... [20:12:22] (03PS3) 10Andrew Bogott: Replace some hardcoded labnet1001 refs with hiera values. [puppet] - 10https://gerrit.wikimedia.org/r/231406 (https://phabricator.wikimedia.org/T99701) [20:12:30] 6operations, 3Reading-Web: Run browser tests to verify the existence of certain texts in the footer - https://phabricator.wikimedia.org/T108081#1536968 (10chasemp) [20:17:10] dyslexics of the world, untie! [20:17:49] http://untied.com [20:18:50] 6operations, 6Services, 3Discovery-Maps-Sprint: Configure maps cluster's tilerator to the specific number of workers - https://phabricator.wikimedia.org/T108974#1536999 (10Yurik) OK, in that case let's keep it as ncpu -- I will finish running the current batch (about 4 days) which I run manually, after which... [20:20:47] 6operations, 6Services, 3Discovery-Maps-Sprint: Configure maps cluster's tilerator to the specific number of workers - https://phabricator.wikimedia.org/T108974#1537008 (10mobrovac) >>! In T108974#1536945, @BBlack wrote: > Puppet has access to the core counts (both logical/hyperthread counts and true physica... [20:22:20] (03PS2) 10Dzahn: fermium: adjust rsync import dir, ensure package [puppet] - 10https://gerrit.wikimedia.org/r/231394 (https://phabricator.wikimedia.org/T108073) [20:22:39] (03CR) 10Dzahn: [C: 032] fermium: adjust rsync import dir, ensure package [puppet] - 10https://gerrit.wikimedia.org/r/231394 (https://phabricator.wikimedia.org/T108073) (owner: 10Dzahn) [20:27:11] 6operations, 6Discovery, 10Traffic, 10Wikidata, and 2 others: Set up a public interface to the wikidata query service - https://phabricator.wikimedia.org/T107602#1537061 (10Joe) https://query.wikidata.org :) [20:27:22] 6operations, 6Discovery, 10Traffic, 10Wikidata, and 2 others: Set up a public interface to the wikidata query service - https://phabricator.wikimedia.org/T107602#1537063 (10Joe) 5Open>3Resolved [20:32:25] 6operations, 6Discovery, 10Traffic, 10Wikidata, and 2 others: Set up a public interface to the wikidata query service - https://phabricator.wikimedia.org/T107602#1537075 (10Smalyshev) yay! {meme, src="antoine-approve"} [20:33:04] ganglia seems to have forgotten how to get data on the logstash elasticsearch jvms -- https://ganglia.wikimedia.org/latest/?c=Logstash+cluster+eqiad&&m=es_heap_used [20:33:55] is gmond the thing to bounce to try and fix that? [20:34:49] "Failed to restart gmond.service: Unit gmond.service failed to load: No such file or directory." [20:39:09] (03PS1) 10Ori.livneh: xenon-generate-svg: fix handling of reversed flame graphs [puppet] - 10https://gerrit.wikimedia.org/r/231417 [20:39:23] (03CR) 10Ori.livneh: [C: 032 V: 032] xenon-generate-svg: fix handling of reversed flame graphs [puppet] - 10https://gerrit.wikimedia.org/r/231417 (owner: 10Ori.livneh) [20:39:43] 6operations, 3Reading-Web: Run browser tests to verify the existence of certain texts in the footer - https://phabricator.wikimedia.org/T108081#1537131 (10bmansurov) @chasemp, great! Thanks. [20:40:31] !log ganglia not getting elasticsearch jvm data for logstash cluster since 2015-08-13T12:00 -- https://ganglia.wikimedia.org/latest/?c=Logstash+cluster+eqiad&&m=es_heap_used [20:40:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:41:21] ^ that may be related to the work _joe_ was doing on ganglia::monitor::aggregator (or maybe not but the time looks like it lines up) [20:42:20] 6operations, 10CirrusSearch, 6Discovery, 10hardware-requests, 3Discovery-Cirrus-Sprint: Request Elasticsearch hardware for secondary CirrusSearch in codfw - https://phabricator.wikimedia.org/T105707#1537144 (10Deskana) It's my understanding that there is no specific action required //for this ticket// fr... [20:42:27] 6operations, 10CirrusSearch, 6Discovery, 10hardware-requests: Request Elasticsearch hardware for secondary CirrusSearch in codfw - https://phabricator.wikimedia.org/T105707#1537145 (10Deskana) [20:42:32] bd808: 1) determine aggregator host for the DC (carbon) 2) find the config file for this cluster: grep Logstash /etc/ganglia/aggregators/* (1043.conf) 3) grep pid for this config file (ps aux | grep 1043) (10492) 4) kill it 5) sudo -u ganglia /usr/sbin/gmond -c /etc/ganglia/aggregators/1043.conf [20:43:30] bd808: eh, but this bug seems to be different since the graph is still created but no data is coming in [20:43:52] I don't have rights to poke about on carbon either [20:45:31] bd808: does this look like it continues https://ganglia.wikimedia.org/latest/graph_all_periods.php?c=Logstash%20cluster%20eqiad&m=es_heap_used&r=hour&s=by%20name&hc=4&mc=2&st=1439498709&g=mem_report&z=large [20:46:20] !log killed ganglia aggregator for logstash on carbon [20:46:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:46:38] es_heap_used is the one I noticed specifically as missing. there is some magic that send jvm data to ganglia [20:50:30] i dont see that in puppet at all, except that elastic search uses it for monitoring [20:50:50] hence "magic" [20:51:20] mobrovac, do you know the best place to put that worker cpu calc for maps? [20:52:03] yurik: using bblack's processorcount, yes [20:52:18] (if we are talking about puppet) [20:52:36] mobrovac, yep, just not sure which file it should go into [20:53:35] yurik: lemme amend https://gerrit.wikimedia.org/r/231319 [20:53:53] ty! [20:54:02] better yet, a dependant ps [21:00:52] bd808: i dont see a difference to the other aggregators that appear to work normal, and also there is data for other metrics like Load http://ganglia.wikimedia.org/latest/graph.php?r=hour&z=xlarge&c=Logstash+cluster+eqiad&m=es_fs_writes&s=by+name&mc=2&g=load_report [21:01:15] so i think it is not the aggregator itself, even though the timing and joe's changes look like it might be related [21:01:46] do we know why there is that drop in load? [21:01:52] mutante: *nod* I think I remember that there is something on the hosts being monitored that can blow up for this. manybubbles would know [21:03:15] * bd808 tries sudo /etc/init.d/ganglia-monitor restart [21:04:40] mutante: that fixed it -- sudo /etc/init.d/ganglia-monitor restart [21:04:46] on each of the hosts [21:04:57] on the logstash hosts? eh, ok... [21:05:00] the changes he made were to the ganglia-aggregator process though, not ganglia-monitor [21:05:12] well, good :) [21:06:03] !log `sudo /etc/init.d/ganglia-monitor restart` on logstash100[1-6] fixed ganglia data loss [21:06:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:09:03] (03PS2) 10Mobrovac: service::node: Make the number of workers to start configurable [puppet] - 10https://gerrit.wikimedia.org/r/231319 (https://phabricator.wikimedia.org/T108888) [21:09:19] (03PS1) 10Mobrovac: Tilerator: start ncpu / 2 workers [puppet] - 10https://gerrit.wikimedia.org/r/231427 (https://phabricator.wikimedia.org/T108974) [21:09:29] yurik: ^^ [21:09:36] mobrovac, already looking ) [21:11:16] (03CR) 10Yurik: [C: 031] Tilerator: start ncpu / 2 workers [puppet] - 10https://gerrit.wikimedia.org/r/231427 (https://phabricator.wikimedia.org/T108974) (owner: 10Mobrovac) [21:12:13] (03PS6) 10Andrew Bogott: Update archive-project-volumes to support our new NFS setup. [puppet] - 10https://gerrit.wikimedia.org/r/229458 (https://phabricator.wikimedia.org/T104857) [21:13:40] (03CR) 10Andrew Bogott: [C: 032] Update archive-project-volumes to support our new NFS setup. [puppet] - 10https://gerrit.wikimedia.org/r/229458 (https://phabricator.wikimedia.org/T104857) (owner: 10Andrew Bogott) [21:14:47] !log service gitblit restart on antimony (maybe that should be paging :) [21:14:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:15:25] (03PS2) 10Mobrovac: Tilerator: start ncpu / 2 workers [puppet] - 10https://gerrit.wikimedia.org/r/231427 (https://phabricator.wikimedia.org/T108974) [21:16:25] 6operations, 10RESTBase, 10RESTBase-Cassandra: Cassandra internode TLS encryption - https://phabricator.wikimedia.org/T108953#1537346 (10Eevans) >>! In T108953#1535313, @Joe wrote: > @Eevans a few questions: > - how should the certs be created (I mean, what should be the name on the cert)? I don't think thi... [21:16:30] !log torrus broken - doing https://wikitech.wikimedia.org/wiki/Torrus#Deadlock_problem [21:16:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:17:31] RECOVERY - torrus.wikimedia.org HTTP on netmon1001 is OK: HTTP OK: HTTP/1.1 200 OK - 2166 bytes in 0.533 second response time [21:25:32] 6operations, 10Gitblit-Deprecate: evaluate "klaus" to replace gitblit as a git web viewer - https://phabricator.wikimedia.org/T109004#1537389 (10Dzahn) 3NEW [21:26:11] 6operations, 10Gitblit-Deprecate, 10Wikimedia-Git-or-Gerrit: git.wikimedia.org is unstable - https://phabricator.wikimedia.org/T83702#917458 (10Dzahn) also see T109004 [21:33:40] 6operations, 10Gitblit-Deprecate: evaluate "klaus" to replace gitblit as a git web viewer - https://phabricator.wikimedia.org/T109004#1537448 (10Dzahn) [21:37:18] 6operations: dataset1001/dumps rsync setup should use rsync::server from module - https://phabricator.wikimedia.org/T108992#1537475 (10Dzahn) [21:37:57] !log ori@tin Synchronized php-1.26wmf18/includes/OutputPage.php: Test impact of I5e6c79c (duration: 00m 12s) [21:38:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:40:11] 6operations, 10Continuous-Integration-Infrastructure: Update RDiscount gem on jenkins build servers (UbuntuTrusty) - https://phabricator.wikimedia.org/T109005#1537495 (10cscott) [21:43:50] !log ori@tin Synchronized php-1.26wmf18/includes/OutputPage.php: Roll back: Test impact of I5e6c79c (duration: 00m 12s) [21:43:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:46:02] Reedy, you didn't deploy your SRF and Validator changes? [21:46:07] No [21:46:11] I didn't see any point [21:46:16] I'm not sure they'll fix it either [21:46:41] they're unmerged on tin but on the deployment branches in gerrit [21:47:06] oh, damn auto updating submodules [21:49:08] 6operations, 10RESTBase, 10RESTBase-Cassandra: Cassandra internode TLS encryption - https://phabricator.wikimedia.org/T108953#1537552 (10Eevans) So (for demonstrative purposes only), here is how the process of {key,trust}store generation usually goes: ``` #!/bin/bash... [21:50:05] They are noops for the live sites, so can be deployed [21:51:15] 6operations, 10Continuous-Integration-Infrastructure: Update RDiscount gem/package on jenkins build servers (UbuntuTrusty) - https://phabricator.wikimedia.org/T109005#1537575 (10cscott) [21:54:00] 6operations: Retire Torrus - https://phabricator.wikimedia.org/T87840#1537587 (10RobH) As others point out, its always broken. I used to use it for power planning, but the same data can be pulled from librenms. [21:55:00] 6operations: Retire Torrus - https://phabricator.wikimedia.org/T87840#1537591 (10RobH) a:3mark I've assigned this to @mark for his confirmation we can kill the torrus service and remove it entirely. [21:57:33] 6operations: Retire Torrus - https://phabricator.wikimedia.org/T87840#1537607 (10RobH) Also when it is broken, it doesn't retain data for the outage period, so any historical graphs have large gaps. Mark will need to confirm for his traffic stats though. [21:58:25] 6operations: Retire Torrus - https://phabricator.wikimedia.org/T87840#1537617 (10RobH) p:5Low>3Normal [21:58:50] 6operations, 7Technical-Debt: Retire Torrus - https://phabricator.wikimedia.org/T87840#1000364 (10RobH) [22:02:27] 6operations, 7Technical-Debt: Retire Torrus - https://phabricator.wikimedia.org/T87840#1537626 (10Dzahn) it was broken again today and i fixed it with the usual steps from wikitech docs [22:03:38] (03PS1) 10BryanDavis: Return super().main() when overriding AbstractSync.main() [tools/scap] - 10https://gerrit.wikimedia.org/r/231442 (https://phabricator.wikimedia.org/T109007) [22:14:13] !log Running refreshLinks --dfn-only in a screen on terbium for T44180 [22:14:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [22:24:32] 6operations, 7Technical-Debt: Retire Torrus - https://phabricator.wikimedia.org/T87840#1537703 (10Cmjohnson) +1 on killing it [22:40:03] 6operations: Run browser tests to verify the existence of certain texts in the footer - https://phabricator.wikimedia.org/T108081#1537780 (10Jdlrobson) [22:42:53] (03CR) 10MaxSem: Added tilerator service, granted kartotherian OSM DB read access (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/229727 (https://phabricator.wikimedia.org/T105074) (owner: 10Yurik) [22:45:33] 6operations: Create an offboarding workflow with HR & Operations - https://phabricator.wikimedia.org/T108131#1512936 (10Dzahn) If T284 happens that might help a lot with getting this setup. [22:45:47] (03CR) 10Yurik: Added tilerator service, granted kartotherian OSM DB read access (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/229727 (https://phabricator.wikimedia.org/T105074) (owner: 10Yurik) [22:56:40] 6operations, 6Human-Resources: Create an offboarding workflow with HR & Operations - https://phabricator.wikimedia.org/T108131#1537823 (10Pine) [22:57:37] 6operations, 6Human-Resources: Create an offboarding workflow with HR & Operations - https://phabricator.wikimedia.org/T108131#1512936 (10Pine) @Dzahn: Human-resources added at your request. [23:00:04] RoanKattouw ostriches rmoen Krenair: Dear anthropoid, the time has come. Please deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150813T2300). [23:00:04] James_F matt_flaschen Krenair: A patch you scheduled for Evening SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [23:00:13] Adsum. [23:00:20] (03PS2) 10Alex Monk: Logo on mr.wikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/228477 (https://phabricator.wikimedia.org/T104132) (owner: 10Dereckson) [23:00:50] (03CR) 10Alex Monk: "I put this through optipng -o7, which we should be doing for all new static image files." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/228477 (https://phabricator.wikimedia.org/T104132) (owner: 10Dereckson) [23:01:31] Krenair: You running SWAT? [23:01:40] I guess so... [23:01:56] Krenair: We can bully others into doing it… :-) [23:01:59] Krenair, I can do it if you want. [23:02:06] (03PS4) 10Alex Monk: Enable VisualEditor for 25% of new accounts on enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/227330 (owner: 10Jforrester) [23:02:12] (03CR) 10Alex Monk: [C: 032] Enable VisualEditor for 25% of new accounts on enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/227330 (owner: 10Jforrester) [23:02:17] (03Merged) 10jenkins-bot: Enable VisualEditor for 25% of new accounts on enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/227330 (owner: 10Jforrester) [23:03:26] !log krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/227330/ (duration: 00m 12s) [23:03:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:10:55] matt_flaschen, ^ please test [23:10:59] !log krenair@tin Synchronized php-1.26wmf18/extensions/Flow: https://gerrit.wikimedia.org/r/#/c/231450/1 (duration: 00m 14s) [23:11:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:12:05] Krenair, thanks, standard functionality is working. I will do a conversion shortly. [23:13:51] (03PS3) 10Alex Monk: Logo on mr.wikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/228477 (https://phabricator.wikimedia.org/T104132) (owner: 10Dereckson) [23:13:57] (03CR) 10Alex Monk: [C: 032] Logo on mr.wikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/228477 (https://phabricator.wikimedia.org/T104132) (owner: 10Dereckson) [23:14:04] (03Merged) 10jenkins-bot: Logo on mr.wikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/228477 (https://phabricator.wikimedia.org/T104132) (owner: 10Dereckson) [23:14:09] (03PS2) 10Alex Monk: Set site name for mr.wikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/231074 (https://phabricator.wikimedia.org/T104132) (owner: 10Dereckson) [23:14:17] (03CR) 10Alex Monk: [C: 032] Set site name for mr.wikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/231074 (https://phabricator.wikimedia.org/T104132) (owner: 10Dereckson) [23:14:24] (03Merged) 10jenkins-bot: Set site name for mr.wikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/231074 (https://phabricator.wikimedia.org/T104132) (owner: 10Dereckson) [23:15:22] !log krenair@tin Synchronized w/static/images/project-logos/mrwikibooks.png: https://gerrit.wikimedia.org/r/#/c/228477/ (duration: 00m 12s) [23:15:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:16:24] !log krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/228477/ and https://gerrit.wikimedia.org/r/#/c/231074/ (duration: 00m 12s) [23:16:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:22:07] (03PS1) 10Dzahn: add table for miraheze and bump version [debs/wikistats] - 10https://gerrit.wikimedia.org/r/231459 (https://phabricator.wikimedia.org/T107398) [23:33:23] (03PS1) 10Mobrovac: MobileApps: Do not use the proxy to issue requests [puppet] - 10https://gerrit.wikimedia.org/r/231463 [23:35:42] (03PS1) 10Jforrester: Enable VisualEditor for 50% of new accounts on enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/231464 [23:35:44] (03PS1) 10Jforrester: Enable VisualEditor for all new accounts on enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/231465 (https://phabricator.wikimedia.org/T90664) [23:43:58] (03CR) 10Jforrester: [C: 04-1] "Not yet." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/231465 (https://phabricator.wikimedia.org/T90664) (owner: 10Jforrester) [23:44:17] (03CR) 10Jforrester: "Roughly scheduled for next Thursday, 20 August." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/231464 (owner: 10Jforrester) [23:47:29] 6operations, 10MediaWiki-extensions-CentralAuth: Special:GlobalUsers varies between claiming a user is or isn't attached - https://phabricator.wikimedia.org/T102915#1537995 (10Krenair) Here's the full query it runs for Glaisher: ```lang=sql SELECT gu_id,gu_name,gu_locked,lu_attached_method,COUNT(gug_group) AS... [23:57:52] 6operations, 6Human-Resources: Create an offboarding workflow with HR & Operations - https://phabricator.wikimedia.org/T108131#1538031 (10Pine) @Dzahn Joady has informed me that HR is currently using Asana exclusively, so I am locking down the Phabricator HR project as best as possible while keeping the inform... [23:57:58] 6operations, 10Citoid, 10Graphoid, 6Mobile-Apps, and 2 others: SCA services should not use a proxy for our domains - https://phabricator.wikimedia.org/T97530#1538032 (10mobrovac) p:5Normal>3High [23:59:12] 6operations, 10Citoid, 10Graphoid, 6Mobile-Apps, and 3 others: SCA services should not use a proxy for our domains - https://phabricator.wikimedia.org/T97530#1538035 (10mobrovac)