[00:15:35] <wikibugs>	 (03CR) 10Krinkle: [C: 04-1] Use a textarea for content differences (031 comment) [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/370160 (https://phabricator.wikimedia.org/T172362) (owner: 10Giuseppe Lavagetto)
[00:31:41] <wikibugs>	 10Operations, 10Domains, 10Traffic, 10Wikimedia Resource Center, 10Patch-For-Review: Create resources.wikimedia.org as a redirect - https://phabricator.wikimedia.org/T172417#3502819 (10Harej) >>! In T172417#3498059, @Harej wrote: > Note that this is pending final c-level approval.  Update: Approval has b...
[00:32:28] <icinga-wm>	 PROBLEM - Check systemd state on phab2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[00:33:11] <paladox>	 mutante is that ^^ phd?
[00:40:13] <wikibugs>	 10Operations, 10Domains, 10Traffic, 10Wikimedia Resource Center, 10Patch-For-Review: Create resources.wikimedia.org as a redirect - https://phabricator.wikimedia.org/T172417#3497968 (10Krinkle) > We want to use a short link that people can access that is easy to remember and type on any browser, on any d...
[00:42:57] <wikibugs>	 (03CR) 10Krinkle: [C: 031] phpcs for refresh-dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/369808 (owner: 10Reedy)
[01:29:29] <icinga-wm>	 ACKNOWLEDGEMENT - Check systemd state on phab2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. daniel_zahn inactive server should not have phd
[01:32:08] <mutante>	 paladox: rsync initial run finished. i think it was pretty busy with that. also without phd running how can it "feel slow" :)
[01:32:35] <mutante>	 which protocol 
[01:44:11] <wikibugs>	 10Operations, 10MediaWiki-extensions-Score: crackling at start of OGG renditions of MIDI files (fixed in TiMidity++ 2.14.0) - https://phabricator.wikimedia.org/T50029#3502881 (10Reedy)
[02:16:13] <wikibugs>	 10Operations, 10Analytics, 10Analytics-Wikistats, 10Wikidata, and 6 others: Create Wikiversity Hindi - https://phabricator.wikimedia.org/T168765#3502957 (10Dzahn)
[02:17:43] <wikibugs>	 10Operations, 10Analytics, 10Analytics-Wikistats, 10Wikidata, and 6 others: Create Wikiversity Hindi - https://phabricator.wikimedia.org/T168765#3502961 (10Reedy)
[02:33:40] <Krinkle>	 Reedy: Do you know of any recent code or config changes to AbuseFilter?
[02:34:08] <Krinkle>	 I ran an XHProf profile on mwdebug1001 on mw.org when making an edit and finding some 750ms spent in AbuseFilter
[02:34:16] <Krinkle>	 950ms *
[02:34:23] <Krinkle>	 Context: T172447
[02:34:24] <stashbot>	 T172447: Investigate 2017-08-02 Save Timing regression (+40-60%) - https://phabricator.wikimedia.org/T172447
[03:10:19] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 032] toolforge: Add qstat-full to bastions [puppet] - 10https://gerrit.wikimedia.org/r/370298 (owner: 10BryanDavis)
[03:26:08] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 607.94 seconds
[03:59:18] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 284.75 seconds
[04:21:04] <harej>	 Question: is a fatal in production considered an UBN even if its trigger is rather obscure?
[04:56:04] <Krinkle>	 harej: If it's a regression (it didn't use to fatal) then yes, I'd mark it as UBN. 
[05:03:21] <harej>	 Krinkle: I don't know if this specific situation didn't use to fatal, but I think in principle a fatal in production is not good?
[05:13:04] <greg-g>	 harej: not good no, what's the issue? It also depends on severity (how many people effected, how frequently used)
[05:13:11] <greg-g>	 sadly our software is not currently bug free ;)
[05:13:34] <harej>	 https://phabricator.wikimedia.org/T172588
[05:23:16] <greg-g>	 harej: is that the only page you've found that fatals?
[05:24:24] <harej>	 That I've seen, yes
[05:24:48] <greg-g>	 that's good at least
[05:24:57] <greg-g>	 I wonder who proposed "How to volunteer editing server config and get a code change into Ops repos." on https://wikimania2017.wikimedia.org/wiki/Hackathon/Program
[05:24:59] <harej>	 And under that exact arrangement. The same page doesn't fatal if you restore it back to its non cursed version
[05:27:44] <greg-g>	 ah, mutante did :) (the session on getting things deployed)
[05:27:56] <greg-g>	 mutante: does that mean you'll be there?
[05:27:58] * greg-g goes
[05:33:08] <Krinkle>	 harej: I've dug up the full error message for you and added it to the task
[05:33:14] <Krinkle>	 Looks like it has to do with nested <translate> tags
[05:33:33] <Krinkle>	 so if you haven't found a way around the fatal yet, this might help.
[05:33:51] <harej>	 Ah. Normally they're not allowed, but if you subst a thing that has Translate tags...
[05:34:03] <Krinkle>	 The ideal outcome of the task in this case would be to error in a "better" way (tell you what's wrong), but it will likely remain an error, however.
[05:34:31] <harej>	 <translate>{{subst:translatable horror}}</translate>
[05:34:45] <Krinkle>	 Yeah, <translate> applies after everything else, so there's no good safesubt:-ish thing for it.
[05:34:46] <harej>	 And yes the proper thing would be to have it error
[05:36:33] <harej>	 Or we could scorch the earth the translate extension lies on, but we can all dream. :)
[10:09:26] <wikibugs>	 (03PS1) 10MarcoAurelio: Enabling OAuth on foundationwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370310 (https://phabricator.wikimedia.org/T170301)
[10:17:31] <wikibugs>	 (03PS2) 10MarcoAurelio: Allow bureaucrats on WMF wikis to grant and remove 'confirmed' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/368939 (https://phabricator.wikimedia.org/T101983)
[11:10:38] <wikibugs>	 (03PS1) 10MarcoAurelio: Grant 'autopatrol' to 'editor' in en.wikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370311 (https://phabricator.wikimedia.org/T172561)
[11:16:47] <wikibugs>	 (03PS1) 10MarcoAurelio: Translate sitename for nl.wikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370313 (https://phabricator.wikimedia.org/T172594)
[11:54:28] <icinga-wm>	 PROBLEM - Check systemd state on stat1005 is CRITICAL: Return code of 255 is out of bounds
[11:54:28] <icinga-wm>	 PROBLEM - Disk space on stat1005 is CRITICAL: Return code of 255 is out of bounds
[11:54:29] <icinga-wm>	 PROBLEM - MD RAID on stat1005 is CRITICAL: Return code of 255 is out of bounds
[11:54:38] <icinga-wm>	 PROBLEM - configured eth on stat1005 is CRITICAL: Return code of 255 is out of bounds
[11:54:38] <icinga-wm>	 PROBLEM - puppet last run on stat1005 is CRITICAL: Return code of 255 is out of bounds
[11:54:48] <icinga-wm>	 PROBLEM - DPKG on stat1005 is CRITICAL: Return code of 255 is out of bounds
[11:55:18] <icinga-wm>	 PROBLEM - salt-minion processes on stat1005 is CRITICAL: Return code of 255 is out of bounds
[11:55:19] <icinga-wm>	 PROBLEM - dhclient process on stat1005 is CRITICAL: Return code of 255 is out of bounds
[11:59:50] <icinga-wm>	 PROBLEM - Check the NTP synchronisation status of timesyncd on stat1005 is CRITICAL: Return code of 255 is out of bounds
[12:02:22] <wikibugs>	 10Operations, 10Analytics, 10Analytics-Wikistats, 10Wikidata, and 6 others: Create Wikiversity Hindi - https://phabricator.wikimedia.org/T168765#3503306 (10Urbanecm) a:03Reedy
[12:04:19] <icinga-wm>	 RECOVERY - salt-minion processes on stat1005 is OK: PROCS OK: 3 processes with regex args ^/usr/bin/python /usr/bin/salt-minion
[12:04:28] <icinga-wm>	 RECOVERY - dhclient process on stat1005 is OK: PROCS OK: 0 processes with command name dhclient
[12:04:28] <icinga-wm>	 RECOVERY - Check systemd state on stat1005 is OK: OK - running: The system is fully operational
[12:04:29] <icinga-wm>	 RECOVERY - Disk space on stat1005 is OK: DISK OK
[12:04:38] <icinga-wm>	 RECOVERY - MD RAID on stat1005 is OK: OK: Active: 8, Working: 8, Failed: 0, Spare: 0
[12:04:38] <icinga-wm>	 RECOVERY - configured eth on stat1005 is OK: OK - interfaces up
[12:04:39] <icinga-wm>	 RECOVERY - puppet last run on stat1005 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures
[12:04:49] <icinga-wm>	 RECOVERY - DPKG on stat1005 is OK: All packages OK
[12:29:48] <icinga-wm>	 RECOVERY - Check the NTP synchronisation status of timesyncd on stat1005 is OK: OK: synced at Sat 2017-08-05 12:29:42 UTC.
[12:57:18] <icinga-wm>	 PROBLEM - DPKG on stat1005 is CRITICAL: Return code of 255 is out of bounds
[12:57:48] <icinga-wm>	 PROBLEM - salt-minion processes on stat1005 is CRITICAL: Return code of 255 is out of bounds
[12:57:48] <icinga-wm>	 PROBLEM - dhclient process on stat1005 is CRITICAL: Return code of 255 is out of bounds
[12:57:49] <icinga-wm>	 PROBLEM - Disk space on stat1005 is CRITICAL: Return code of 255 is out of bounds
[12:57:49] <icinga-wm>	 PROBLEM - Check systemd state on stat1005 is CRITICAL: Return code of 255 is out of bounds
[12:57:58] <icinga-wm>	 PROBLEM - configured eth on stat1005 is CRITICAL: Return code of 255 is out of bounds
[12:57:59] <icinga-wm>	 PROBLEM - MD RAID on stat1005 is CRITICAL: Return code of 255 is out of bounds
[12:58:08] <icinga-wm>	 PROBLEM - puppet last run on stat1005 is CRITICAL: Return code of 255 is out of bounds
[13:01:48] <icinga-wm>	 PROBLEM - Check the NTP synchronisation status of timesyncd on stat1005 is CRITICAL: Return code of 255 is out of bounds
[13:04:18] <icinga-wm>	 RECOVERY - DPKG on stat1005 is OK: All packages OK
[13:04:48] <icinga-wm>	 RECOVERY - salt-minion processes on stat1005 is OK: PROCS OK: 3 processes with regex args ^/usr/bin/python /usr/bin/salt-minion
[13:04:49] <icinga-wm>	 RECOVERY - dhclient process on stat1005 is OK: PROCS OK: 0 processes with command name dhclient
[13:04:49] <icinga-wm>	 RECOVERY - Disk space on stat1005 is OK: DISK OK
[13:04:58] <icinga-wm>	 RECOVERY - Check systemd state on stat1005 is OK: OK - running: The system is fully operational
[13:04:59] <icinga-wm>	 RECOVERY - configured eth on stat1005 is OK: OK - interfaces up
[13:05:08] <icinga-wm>	 RECOVERY - MD RAID on stat1005 is OK: OK: Active: 8, Working: 8, Failed: 0, Spare: 0
[13:05:09] <icinga-wm>	 RECOVERY - puppet last run on stat1005 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures
[13:05:33] <yannf>	 Database locked on Commons?
[13:12:49] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s4 on db2019 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 385.61 seconds
[13:13:48] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s4 on db2019 is OK: OK slave_sql_lag Replication lag: 0.24 seconds
[13:17:34] <wikibugs>	 (03PS1) 10Ladsgroup: mediawiki: Another increase of batch size in dispatchChanges cronjob [puppet] - 10https://gerrit.wikimedia.org/r/370315 (https://phabricator.wikimedia.org/T171263)
[13:21:02] <wikibugs>	 (03CR) 10Sjoerddebruin: [C: 031] "Hope the dispatch will be "dope" with that batch size. ;)" [puppet] - 10https://gerrit.wikimedia.org/r/370315 (https://phabricator.wikimedia.org/T171263) (owner: 10Ladsgroup)
[13:22:43] <wikibugs>	 (03CR) 10Luke081515: [C: 031] Enabling OAuth on foundationwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370310 (https://phabricator.wikimedia.org/T170301) (owner: 10MarcoAurelio)
[13:31:48] <icinga-wm>	 RECOVERY - Check the NTP synchronisation status of timesyncd on stat1005 is OK: OK: synced at Sat 2017-08-05 13:31:42 UTC.
[13:48:13] <wikibugs>	 (03CR) 10Lucas Werkmeister (WMDE): "Load on Terbium may be fine, but if I understand correctly, we also need to watch out that we don’t overflow the change queues on the clie" [puppet] - 10https://gerrit.wikimedia.org/r/370315 (https://phabricator.wikimedia.org/T171263) (owner: 10Ladsgroup)
[14:13:01] <logmsgbot>	 !log reedy@tin Synchronized php-1.30.0-wmf.12/extensions/WikimediaMaintenance/createExtensionTables.php: add oauth (duration: 00m 48s)
[14:13:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:22:20] <TabbyCat>	 Reedy: https://phabricator.wikimedia.org/diffusion/EWMA/browse/master/createExtensionTables.php still not updated?
[14:24:46] <Reedy>	 replication lag?
[14:24:55] <Reedy>	 i think phab only does it every few mins
[14:26:22] <paladox>	 ever 30 secs
[14:26:25] <paladox>	 if the repo is active
[14:26:30] <paladox>	 in the last few days
[14:26:39] <paladox>	 ever = every
[14:27:02] <paladox>	 https://phabricator.wikimedia.org/diffusion/EWMA/manage/status/
[14:27:06] <paladox>	 shows 45 minutes
[14:28:05] <TabbyCat>	 true that
[14:28:52] <TabbyCat>	 they're are there now :)
[14:32:07] <paladox>	 yep :)
[14:32:18] <paladox>	 clicking the update now button should make the repo update
[14:32:38] <paladox>	 though only repo admins and owners of the repo and admins can do that
[14:40:43] <Reedy>	 !log created oauth tables on foundationwiki T172591
[14:40:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:40:55] <stashbot>	 T172591: Create OAuth tables for foundationwiki - https://phabricator.wikimedia.org/T172591
[14:41:16] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists: lists.wikimedia.org (208.80.154.21) blocked by Trend Micro - https://phabricator.wikimedia.org/T172602#3503473 (10Platonides)
[14:42:30] <wikibugs>	 10Operations, 10Mail, 10Wikimedia-Mailing-lists: lists.wikimedia.org (208.80.154.21) blocked by Trend Micro - https://phabricator.wikimedia.org/T172602#3503485 (10Platonides)
[14:45:08] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s4 on db2051 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 302.16 seconds
[14:45:18] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s4 on db2044 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 308.56 seconds
[14:45:19] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s4 on db2058 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 310.69 seconds
[14:45:28] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s4 on db2037 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 315.76 seconds
[14:45:38] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s4 on db2019 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 324.18 seconds
[14:47:19] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s4 on db2058 is OK: OK slave_sql_lag Replication lag: 44.23 seconds
[14:47:28] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s4 on db2037 is OK: OK slave_sql_lag Replication lag: 16.97 seconds
[14:47:38] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s4 on db2019 is OK: OK slave_sql_lag Replication lag: 0.00 seconds
[14:48:09] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s4 on db2051 is OK: OK slave_sql_lag Replication lag: 0.00 seconds
[14:48:18] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s4 on db2044 is OK: OK slave_sql_lag Replication lag: 0.49 seconds
[14:54:18] <icinga-wm>	 PROBLEM - CirrusSearch eqiad 95th percentile latency on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0]
[15:01:18] <icinga-wm>	 RECOVERY - CirrusSearch eqiad 95th percentile latency on graphite1001 is OK: OK: Less than 20.00% above the threshold [500.0]
[15:05:08] <wikibugs>	 10Operations, 10Pybal, 10Traffic: lvs servers report 'Memory allocation problem' on bootup - https://phabricator.wikimedia.org/T82849#3503596 (10ema) A more general patch has been submitted by Julian Anastasov http://archive.linuxvirtualserver.org/html/lvs-devel/2017-08/msg00001.html \o/
[17:28:48] <icinga-wm>	 PROBLEM - pdfrender on scb1002 is CRITICAL: connect to address 10.64.16.21 and port 5252: Connection refused
[19:06:17] <icinga-wm>	 PROBLEM - Check systemd state on mw1285 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[19:06:17] <icinga-wm>	 PROBLEM - Disk space on mw1285 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[19:06:17] <icinga-wm>	 PROBLEM - HHVM rendering on mw1285 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 9.453 second response time
[19:06:18] <icinga-wm>	 PROBLEM - Apache HTTP on mw1285 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 8.423 second response time
[19:07:07] <icinga-wm>	 RECOVERY - Check systemd state on mw1285 is OK: OK - running: The system is fully operational
[19:07:07] <icinga-wm>	 RECOVERY - Disk space on mw1285 is OK: DISK OK
[19:07:08] <icinga-wm>	 RECOVERY - HHVM rendering on mw1285 is OK: HTTP OK: HTTP/1.1 200 OK - 73363 bytes in 0.173 second response time
[19:07:08] <icinga-wm>	 RECOVERY - Apache HTTP on mw1285 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.074 second response time
[20:01:38] <wikibugs>	 10Operations, 10MediaWiki-Database, 10NewPHP, 10Patch-For-Review, 10Technical-Debt: Remove old mysql extension support in favor of mysqli - https://phabricator.wikimedia.org/T120333#3503847 (10Reedy) We should probably make a move on this...  However, if we look at WMF production where we're still using...
[21:58:46] <Zppix>	 hey twentyafterfour i think the status in topic should updated, but if im incorrect feel free to disregard this msg.
[22:08:57] <icinga-wm>	 PROBLEM - CirrusSearch eqiad 95th percentile latency on graphite1001 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [1000.0]
[22:37:05] <wikibugs>	 (03PS1) 10Ebe123: Run Lilypond from Firejail [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370358 (https://phabricator.wikimedia.org/T171372)
[22:45:08] <icinga-wm>	 RECOVERY - CirrusSearch eqiad 95th percentile latency on graphite1001 is OK: OK: Less than 20.00% above the threshold [500.0]
[23:41:29] <wikibugs>	 (03PS1) 10Ebe123: Run Lilypond from Firejail [puppet] - 10https://gerrit.wikimedia.org/r/370361 (https://phabricator.wikimedia.org/T171372)