[01:12:20] <icinga-wm>	 PROBLEM - HHVM rendering on mw1293 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[01:13:10] <icinga-wm>	 RECOVERY - HHVM rendering on mw1293 is OK: HTTP OK: HTTP/1.1 200 OK - 73987 bytes in 0.140 second response time
[01:41:29] <icinga-wm>	 PROBLEM - Disk space on graphite1003 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[02:15:39] <icinga-wm>	 PROBLEM - Disk space on graphite1003 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[03:15:59] <icinga-wm>	 PROBLEM - Disk space on graphite1003 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[03:29:20] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 650.51 seconds
[04:15:19] <icinga-wm>	 PROBLEM - Disk space on graphite1003 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[04:27:40] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 223.90 seconds
[04:43:29] <icinga-wm>	 PROBLEM - Disk space on graphite1003 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[07:22:50] <icinga-wm>	 PROBLEM - Check HHVM threads for leakage on mw1168 is CRITICAL: CRITICAL: HHVM has more than double threads running or queued than apache has busy workers
[07:27:59] <icinga-wm>	 PROBLEM - Check size of conntrack table on mw2256 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[07:27:59] <icinga-wm>	 PROBLEM - Disk space on mw2256 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[07:28:29] <icinga-wm>	 PROBLEM - nutcracker port on mw2256 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[07:52:39] <icinga-wm>	 PROBLEM - IPMI Temperature on mw2256 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[09:17:26] <wikibugs>	 (03PS2) 10Giuseppe Lavagetto: systemd::service: convert a bunch of modules to it [puppet] - 10https://gerrit.wikimedia.org/r/371481 (https://phabricator.wikimedia.org/T173078)
[09:17:59] <icinga-wm>	 RECOVERY - Check HHVM threads for leakage on mw1168 is OK: OK
[10:53:42] <TabbyCat>	 >>> UNRECOVERABLE FATAL ERROR <<<
[10:53:44] <TabbyCat>	 Undefined class constant &#039;STATUS_CLOSED&#039;
[10:53:45] <TabbyCat>	 /srv/deployment/phabricator/deployment-cache/revs/3d728e1f6bb6c82cc46d3b062c2d0f49f0823694/phabricator/src/applications/differential/query/DifferentialRevisionSearchEngine.php:137
[13:44:59] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s1 on db1047 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 607.73 seconds
[13:49:09] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 845.36 seconds
[13:56:09] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 224.83 seconds
[14:05:09] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s1 on db1047 is OK: OK slave_sql_lag Replication lag: 12.98 seconds
[14:26:11] <wikibugs>	 (03CR) 10Krinkle: systemd::service: convert a bunch of modules to it (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/371481 (https://phabricator.wikimedia.org/T173078) (owner: 10Giuseppe Lavagetto)
[14:29:18] <wikibugs>	 10Operations, 10Android-app-feature-Compilations, 10Reading-Infrastructure-Team-Backlog, 10Traffic, and 2 others: Determine how to upload Zim files to Swift infrastructure - https://phabricator.wikimedia.org/T172123#3520604 (10Mholloway)
[14:29:20] <wikibugs>	 10Operations, 10Android-app-feature-Compilations, 10Reading-Infrastructure-Team-Backlog, 10Wikipedia-Android-App-Backlog: Create 'pagecompilation' Swift account(s) (beta + prod) for Readers offline article compilations project - https://phabricator.wikimedia.org/T172735#3520603 (10Mholloway) 05Open>03Re...
[14:34:23] <wikibugs>	 (03PS17) 10Rush: tools: job to copytruncate logs in place [puppet] - 10https://gerrit.wikimedia.org/r/326153 (https://phabricator.wikimedia.org/T152235)
[14:34:50] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] tools: job to copytruncate logs in place [puppet] - 10https://gerrit.wikimedia.org/r/326153 (https://phabricator.wikimedia.org/T152235) (owner: 10Rush)
[14:39:55] <wikibugs>	 (03PS18) 10Rush: tools: job to copytruncate logs in place [puppet] - 10https://gerrit.wikimedia.org/r/326153 (https://phabricator.wikimedia.org/T152235)
[14:40:22] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] tools: job to copytruncate logs in place [puppet] - 10https://gerrit.wikimedia.org/r/326153 (https://phabricator.wikimedia.org/T152235) (owner: 10Rush)
[14:42:34] <wikibugs>	 (03PS19) 10Rush: tools: job to copytruncate logs in place [puppet] - 10https://gerrit.wikimedia.org/r/326153 (https://phabricator.wikimedia.org/T152235)
[14:46:24] <wikibugs>	 (03PS3) 10Giuseppe Lavagetto: systemd::service: convert a bunch of modules to it [puppet] - 10https://gerrit.wikimedia.org/r/371481 (https://phabricator.wikimedia.org/T173078)
[14:46:26] <wikibugs>	 (03PS2) 10Giuseppe Lavagetto: prometheus: convert to systemd::service where needed [puppet] - 10https://gerrit.wikimedia.org/r/371482 (https://phabricator.wikimedia.org/T173078)
[14:46:28] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: varnish: convert to systemd::service [puppet] - 10https://gerrit.wikimedia.org/r/371617 (https://phabricator.wikimedia.org/T173078)
[14:46:30] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: thumbor,swift: convert to systemd::service and systemd::unit [puppet] - 10https://gerrit.wikimedia.org/r/371618 (https://phabricator.wikimedia.org/T173078)
[14:46:32] <wikibugs>	 (03PS1) 10Giuseppe Lavagetto: base::service_unit: convert services to systemd::service [puppet] - 10https://gerrit.wikimedia.org/r/371619 (https://phabricator.wikimedia.org/T173078)
[15:06:30] <icinga-wm>	 PROBLEM - CirrusSearch eqiad 95th percentile latency on graphite1001 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [1000.0]
[15:07:06] <ebernhardson>	 thats just 1017 again ...
[15:13:58] <wikibugs>	 (03PS20) 10Rush: tools: job to copytruncate logs in place [puppet] - 10https://gerrit.wikimedia.org/r/326153 (https://phabricator.wikimedia.org/T152235)
[15:16:39] <icinga-wm>	 RECOVERY - CirrusSearch eqiad 95th percentile latency on graphite1001 is OK: OK: Less than 20.00% above the threshold [500.0]
[15:23:31] <elukey>	 mw2256 is different this time, load avg to the roof.. I am able to see com2 but not ssh or log in as root
[15:25:03] <elukey>	 !log powercycle mw2256 (able to use com2 but not to login as root, regular ssh hanging) - T163346
[15:25:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:25:17] <stashbot>	 T163346: mw2256 - hardware issue - https://phabricator.wikimedia.org/T163346
[15:25:37] <wikibugs>	 (03CR) 10Daniel Kinzler: [C: 031] "yes, please" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371067 (owner: 10Thiemo Mättig (WMDE))
[15:26:28] <wikibugs>	 (03CR) 10Daniel Kinzler: [C: 031] "yes please" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371069 (owner: 10Thiemo Mättig (WMDE))
[15:26:49] <icinga-wm>	 RECOVERY - dhclient process on mw2256 is OK: PROCS OK: 0 processes with command name dhclient
[15:26:49] <icinga-wm>	 RECOVERY - configured eth on mw2256 is OK: OK - interfaces up
[15:26:50] <icinga-wm>	 RECOVERY - Check whether ferm is active by checking the default input chain on mw2256 is OK: OK ferm input default policy is set
[15:27:00] <icinga-wm>	 RECOVERY - MD RAID on mw2256 is OK: OK: Active: 4, Working: 4, Failed: 0, Spare: 0
[15:27:09] <icinga-wm>	 RECOVERY - nutcracker process on mw2256 is OK: PROCS OK: 1 process with UID = 114 (nutcracker), command name nutcracker
[15:27:09] <icinga-wm>	 RECOVERY - Check size of conntrack table on mw2256 is OK: OK: nf_conntrack is 0 % full
[15:27:09] <icinga-wm>	 RECOVERY - DPKG on mw2256 is OK: All packages OK
[15:27:09] <icinga-wm>	 RECOVERY - SSH on mw2256 is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u3 (protocol 2.0)
[15:27:19] <icinga-wm>	 RECOVERY - Disk space on mw2256 is OK: DISK OK
[15:27:29] <icinga-wm>	 RECOVERY - Check systemd state on mw2256 is OK: OK - running: The system is fully operational
[15:27:29] <icinga-wm>	 RECOVERY - HHVM processes on mw2256 is OK: PROCS OK: 6 processes with command name hhvm
[15:27:29] <icinga-wm>	 RECOVERY - salt-minion processes on mw2256 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[15:27:39] <icinga-wm>	 RECOVERY - nutcracker port on mw2256 is OK: TCP OK - 0.000 second response time on 127.0.0.1 port 11212
[15:29:25] <elukey>	 ah some traces in syslog, gooood
[15:35:10] <wikibugs>	 10Operations, 10ops-codfw: mw2256 - hardware issue - https://phabricator.wikimedia.org/T163346#3520737 (10elukey) This time the host showed a sudden increase in load average and I can see this in the syslog at around the same time:  {F9045686}  ``` Aug 11 21:21:38 mw2256 kernel: [109231.690343] BUG: stack guar...
[15:37:30] <elukey>	 all right will investigate more on monday :)
[15:37:40] <moritzm>	 I think the stack trace is just fallout of the hardware error
[15:38:02] <elukey>	 hello moritzm!
[15:38:32] <moritzm>	 hi :-)
[15:38:42] <moritzm>	 let's keep it depooled and work Papaul thriugh it with Dell
[15:39:07] <elukey>	 it is pooled now, I can set it inactive
[15:39:34] <moritzm>	 this will likely crash again over the weekend, let's rather set it to inactive again
[15:39:39] <icinga-wm>	 RECOVERY - Check the NTP synchronisation status of timesyncd on mw2256 is OK: OK: synced at Sat 2017-08-12 15:39:33 UTC.
[15:40:10] <elukey>	 inactive :)
[15:40:57] <elukey>	 I am wondering if firmware/bios upgrade + thermal paste changed anything
[15:41:27] <elukey>	 like the fact that now it doesn't completely freeze but it shows some trace of errors
[15:41:28] <Reedy>	 it changed the thermal paste :P
[15:41:41] <moritzm>	 probably, maybe the SOS will show the error to Dell
[15:44:49] <elukey>	 all right enjoy your saturday people! ttl :)
[15:47:33] <moritzm>	 right, see you on Monday
[15:52:39] <icinga-wm>	 RECOVERY - IPMI Temperature on mw2256 is OK: Sensor Type(s) Temperature Status: OK
[16:09:52] <Reedy>	 !log Deleted some bogus user languages from commonswiki.user_properties
[16:10:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:32:49] <icinga-wm>	 PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1950 bytes in 0.104 second response time
[16:37:49] <icinga-wm>	 RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1950 bytes in 0.121 second response time
[16:48:45] <wikibugs>	 (03PS1) 10Jforrester: Enable responsive reference columns on enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371630 (https://phabricator.wikimedia.org/T173176)
[17:20:40] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s4 on db2037 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 306.79 seconds
[17:21:39] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s4 on db2044 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 324.99 seconds
[17:29:33] <wikibugs>	 (03CR) 10Rush: [C: 032] tools: job to copytruncate logs in place [puppet] - 10https://gerrit.wikimedia.org/r/326153 (https://phabricator.wikimedia.org/T152235) (owner: 10Rush)
[17:31:40] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s4 on db2044 is OK: OK slave_sql_lag Replication lag: 0.02 seconds
[17:32:49] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s4 on db2037 is OK: OK slave_sql_lag Replication lag: 0.03 seconds
[17:51:09] <wikibugs>	 (03PS1) 10Rush: tools: followup to  326153 [puppet] - 10https://gerrit.wikimedia.org/r/371633 (https://phabricator.wikimedia.org/T152235)
[17:51:26] <wikibugs>	 (03Abandoned) 10Rush: tool: convert HBA source host mechanism to static [puppet] - 10https://gerrit.wikimedia.org/r/334203 (https://phabricator.wikimedia.org/T156168) (owner: 10Rush)
[17:51:39] <wikibugs>	 (03PS2) 10Rush: tools: followup to  326153 [puppet] - 10https://gerrit.wikimedia.org/r/371633 (https://phabricator.wikimedia.org/T152235)
[17:52:23] <wikibugs>	 (03CR) 10Rush: [C: 032] tools: followup to  326153 [puppet] - 10https://gerrit.wikimedia.org/r/371633 (https://phabricator.wikimedia.org/T152235) (owner: 10Rush)
[17:53:49] <icinga-wm>	 PROBLEM - puppet last run on labstore1005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[17:54:49] <icinga-wm>	 RECOVERY - puppet last run on labstore1005 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures
[17:54:59] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s4 on db2044 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 306.63 seconds
[17:55:02] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s4 on db2037 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 300.23 seconds
[18:02:00] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s4 on db2044 is OK: OK slave_sql_lag Replication lag: 0.00 seconds
[18:02:09] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s4 on db2037 is OK: OK slave_sql_lag Replication lag: 0.00 seconds
[18:10:17] <wikibugs>	 (03PS1) 10Rush: tools: remote logcleanup active cron for now [puppet] - 10https://gerrit.wikimedia.org/r/371634 (https://phabricator.wikimedia.org/T152235)
[18:11:37] <wikibugs>	 (03PS2) 10Rush: tools: remote logcleanup active cron for now [puppet] - 10https://gerrit.wikimedia.org/r/371634 (https://phabricator.wikimedia.org/T152235)
[18:14:13] <wikibugs>	 (03CR) 10Rush: [C: 032] tools: remote logcleanup active cron for now [puppet] - 10https://gerrit.wikimedia.org/r/371634 (https://phabricator.wikimedia.org/T152235) (owner: 10Rush)
[18:14:15] <wikibugs>	 (03PS1) 10Mark Bergsma: Instrument the Twisted reactor with Prometheus metrics [debs/pybal] - 10https://gerrit.wikimedia.org/r/371636
[18:16:53] <wikibugs>	 (03PS2) 10Mark Bergsma: Instrument the Twisted reactor with Prometheus metrics [debs/pybal] - 10https://gerrit.wikimedia.org/r/371636 (https://phabricator.wikimedia.org/T171710)
[18:23:39] <wikibugs>	 10Operations, 10Wiki-Loves-Monuments (2017): Import Wiki Loves Monuments photos from Flickr to Commons - https://phabricator.wikimedia.org/T173056#3521172 (10Multichill) @fgiunchedi what do you think are the risks? Number of incoming images maybe? Haven't seen any issues in that area for a long time. Maybe som...
[18:55:12] <wikibugs>	 10Operations, 10MediaWiki-extensions-Scribunto: Build and push a new hhvm-luasandbox package - https://phabricator.wikimedia.org/T171166#3521217 (10eranroz) p:05High>03Unbreak!
[19:31:02] <wikibugs>	 (03CR) 10Ema: "Looks good, just a few minor comments." (033 comments) [debs/pybal] - 10https://gerrit.wikimedia.org/r/371636 (https://phabricator.wikimedia.org/T171710) (owner: 10Mark Bergsma)
[19:34:30] <wikibugs>	 (03CR) 10Ema: [C: 04-1] "> Looks good, just a few minor comments." [debs/pybal] - 10https://gerrit.wikimedia.org/r/371636 (https://phabricator.wikimedia.org/T171710) (owner: 10Mark Bergsma)
[19:36:45] <wikibugs>	 (03PS1) 10Filippo Giunchedi: udev: new module [puppet] - 10https://gerrit.wikimedia.org/r/371642
[19:38:05] <wikibugs>	 (03Abandoned) 10Filippo Giunchedi: profile: fix udev reload dependency for swift::storage::labs [puppet] - 10https://gerrit.wikimedia.org/r/371582 (owner: 10Filippo Giunchedi)
[19:41:04] <wikibugs>	 (03CR) 10Filippo Giunchedi: "PCC https://puppet-compiler.wmflabs.org/compiler02/7420/ms-be1030.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/371642 (owner: 10Filippo Giunchedi)
[19:50:49] <icinga-wm>	 PROBLEM - Disk space on graphite1003 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[19:52:38] <Krenair>	 bblack, are you at wikimania?
[19:53:31] <wikibugs>	 (03PS1) 10Mobrovac: Add /home/ files for mobrovac [puppet] - 10https://gerrit.wikimedia.org/r/371644
[19:54:57] <Krenair>	 bblack, I wrote https://gerrit.wikimedia.org/r/#/c/317450/ a while ago and am thinking about what might be missing
[19:56:02] <wikibugs>	 (03PS2) 10Mobrovac: Add /home/ files for mobrovac [puppet] - 10https://gerrit.wikimedia.org/r/371644
[19:57:17] <wikibugs>	 (03PS3) 10Mark Bergsma: Instrument the Twisted reactor with Prometheus metrics [debs/pybal] - 10https://gerrit.wikimedia.org/r/371636 (https://phabricator.wikimedia.org/T171710)
[20:00:52] <logmsgbot>	 !log krinkle@tin Synchronized php-1.30.0-wmf.13/includes/jobqueue/JobQueueGroup.php: T171371 - Log job pushes to bogus wikis (duration: 00m 53s)
[20:01:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:01:06] <stashbot>	 T171371: Investigate 30x increase in Jobrunner errors - https://phabricator.wikimedia.org/T171371
[20:03:41] <wikibugs>	 (03PS4) 10Mark Bergsma: Instrument the Twisted reactor with Prometheus metrics [debs/pybal] - 10https://gerrit.wikimedia.org/r/371636 (https://phabricator.wikimedia.org/T171710)
[20:03:55] <mark>	 ema: ^ try that one?
[20:04:02] <mark>	 i can't easily test epoll atm
[20:05:13] <wikibugs>	 (03PS3) 10Mobrovac: Add /home/ files for mobrovac [puppet] - 10https://gerrit.wikimedia.org/r/371644
[20:06:27] <ema>	 mark: trying
[20:06:43] <wikibugs>	 (03PS4) 10Mobrovac: Add /home/ files for mobrovac [puppet] - 10https://gerrit.wikimedia.org/r/371644
[20:06:45] <ema>	 mark: it works!
[20:06:48] <mark>	 hm
[20:06:51] <mark>	 but that 4th argument
[20:06:54] <mark>	 I can add it to labels I guess
[20:06:57] <mark>	 event
[20:06:59] <ema>	 mark: what's the "method" label?
[20:07:00] <mark>	 gimme a sec
[20:07:04] <mark>	 doRead or doWrite
[20:07:12] <ema>	 like method=3
[20:07:18] <ema>	 pybal_reactor_do_read_or_write_duration_count{method="3",selectable="Client"} 6.0
[20:07:21] <ema>	 pybal_reactor_do_read_or_write_duration_count{method="3",selectable="Client"} 6.0
[20:07:40] <wikibugs>	 (03CR) 10Ema: [C: 032] Add /home/ files for mobrovac [puppet] - 10https://gerrit.wikimedia.org/r/371644 (owner: 10Mobrovac)
[20:08:32] <wikibugs>	 (03PS5) 10Mark Bergsma: Instrument the Twisted reactor with Prometheus metrics [debs/pybal] - 10https://gerrit.wikimedia.org/r/371636 (https://phabricator.wikimedia.org/T171710)
[20:08:39] <mark>	 ^ that one adds 'event'
[20:08:52] <mark>	 ema: oh, thats different with Select
[20:08:58] <mark>	 epoll specific I guess
[20:09:39] <icinga-wm>	 PROBLEM - Upload HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0]
[20:09:59] <icinga-wm>	 PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0]
[20:10:09] <wikibugs>	 10Operations, 10MediaWiki-extensions-Scribunto: Build and push a new hhvm-luasandbox package - https://phabricator.wikimedia.org/T171166#3521370 (10MoritzMuehlenhoff) For the current status I'm not sure. In T171267 @tstarling mentioned tests in deployment-prep. If those were successful, we proceed with the act...
[20:13:33] <mark>	 hm
[20:15:34] <wikibugs>	 (03PS1) 10Mobrovac: home/mobrovac: Add dir for temp vim files [puppet] - 10https://gerrit.wikimedia.org/r/371647
[20:15:39] <icinga-wm>	 RECOVERY - Upload HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[20:16:38] <wikibugs>	 (03CR) 10Ema: [C: 032] home/mobrovac: Add dir for temp vim files [puppet] - 10https://gerrit.wikimedia.org/r/371647 (owner: 10Mobrovac)
[20:16:59] <icinga-wm>	 RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[20:19:06] <petan>	 !ping
[20:19:14] <petan>	 !ping is pong
[20:19:14] <goat-bot4>	 Key was added
[20:23:29] <icinga-wm>	 RECOVERY - puppet last run on restbase1014 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures
[20:25:25] <mark>	 ema: i'm also thinking we should only instrument the reactor on demand
[20:25:29] <mark>	 perhaps a cli switch?
[20:25:33] <mark>	 it's extra overhead
[20:25:39] <mark>	 on every read/write :P
[21:24:20] <icinga-wm>	 PROBLEM - Disk space on graphite1003 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[21:53:30] <icinga-wm>	 PROBLEM - Disk space on graphite1003 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[22:05:14] <wikibugs>	 (03PS1) 10Greg Grossmeier: Phabricator: Override the frog token's label [puppet] - 10https://gerrit.wikimedia.org/r/371660 (https://phabricator.wikimedia.org/T173208)
[22:06:16] <wikibugs>	 (03CR) 10Awight: [C: 031] "<3" [puppet] - 10https://gerrit.wikimedia.org/r/371660 (https://phabricator.wikimedia.org/T173208) (owner: 10Greg Grossmeier)
[22:13:40] <wikibugs>	 10Operations, 10Ops-Access-Requests: Make @daniel a MediaWiki deployer - https://phabricator.wikimedia.org/T173230#3521683 (10Legoktm)
[22:13:56] <wikibugs>	 10Operations, 10Ops-Access-Requests, 10Release-Engineering-Team: Make @daniel a MediaWiki deployer - https://phabricator.wikimedia.org/T173230#3521696 (10Legoktm)
[22:15:57] <wikibugs>	 10Operations, 10Ops-Access-Requests, 10Release-Engineering-Team: Make @daniel a MediaWiki deployer - https://phabricator.wikimedia.org/T173230#3521683 (10hoo) +1
[22:16:08] <wikibugs>	 10Operations, 10Ops-Access-Requests, 10Release-Engineering-Team: Make @daniel a MediaWiki deployer - https://phabricator.wikimedia.org/T173230#3521705 (10daniel) Would be useful to have the permissions, yes. Even though I really don't want to be responsible for the Wikidata build ;)
[22:18:53] <wikibugs>	 10Operations, 10Ops-Access-Requests, 10Release-Engineering-Team: Make @daniel a MediaWiki deployer - https://phabricator.wikimedia.org/T173230#3521683 (10Reedy) Guess @daniel has already signed NDA's etc  Should just be swapping daniel restricted -> deployment
[22:18:56] <wikibugs>	 10Operations, 10Ops-Access-Requests, 10Release-Engineering-Team: Make @daniel a MediaWiki deployer - https://phabricator.wikimedia.org/T173230#3521683 (10greg) But it's Saturday!  Seriously though, deployer like "I know the basics and can push out a change I need by myself without help from releng/someone wh...
[22:20:01] <wikibugs>	 (03Draft2) 10Reedy: Make daniel a deployer [puppet] - 10https://gerrit.wikimedia.org/r/371661 (https://phabricator.wikimedia.org/T173230)
[22:20:32] <wikibugs>	 10Operations, 10Ops-Access-Requests, 10Release-Engineering-Team: Make @daniel a MediaWiki deployer - https://phabricator.wikimedia.org/T173230#3521711 (10Reedy) https://gerrit.wikimedia.org/r/#/c/371661/
[22:20:40] <icinga-wm>	 PROBLEM - Disk space on graphite1003 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[22:45:22] <icinga-wm>	 PROBLEM - Disk space on graphite1003 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[22:57:10] <wikibugs>	 10Operations, 10Analytics, 10Analytics-Wikistats, 10Wikidata, and 6 others: Create Wikiversity Hindi - https://phabricator.wikimedia.org/T168765#3521735 (10Urbanecm) 05Open>03Resolved Wiki itself is created. CS isn't a blocker.
[23:49:10] <icinga-wm>	 PROBLEM - Disk space on graphite1003 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.