[01:12:20] PROBLEM - HHVM rendering on mw1293 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:13:10] RECOVERY - HHVM rendering on mw1293 is OK: HTTP OK: HTTP/1.1 200 OK - 73987 bytes in 0.140 second response time [01:41:29] PROBLEM - Disk space on graphite1003 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [02:15:39] PROBLEM - Disk space on graphite1003 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [03:15:59] PROBLEM - Disk space on graphite1003 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [03:29:20] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 650.51 seconds [04:15:19] PROBLEM - Disk space on graphite1003 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [04:27:40] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 223.90 seconds [04:43:29] PROBLEM - Disk space on graphite1003 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [07:22:50] PROBLEM - Check HHVM threads for leakage on mw1168 is CRITICAL: CRITICAL: HHVM has more than double threads running or queued than apache has busy workers [07:27:59] PROBLEM - Check size of conntrack table on mw2256 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [07:27:59] PROBLEM - Disk space on mw2256 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [07:28:29] PROBLEM - nutcracker port on mw2256 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [07:52:39] PROBLEM - IPMI Temperature on mw2256 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [09:17:26] (03PS2) 10Giuseppe Lavagetto: systemd::service: convert a bunch of modules to it [puppet] - 10https://gerrit.wikimedia.org/r/371481 (https://phabricator.wikimedia.org/T173078) [09:17:59] RECOVERY - Check HHVM threads for leakage on mw1168 is OK: OK [10:53:42] >>> UNRECOVERABLE FATAL ERROR <<< [10:53:44] Undefined class constant 'STATUS_CLOSED' [10:53:45] /srv/deployment/phabricator/deployment-cache/revs/3d728e1f6bb6c82cc46d3b062c2d0f49f0823694/phabricator/src/applications/differential/query/DifferentialRevisionSearchEngine.php:137 [13:44:59] PROBLEM - MariaDB Slave Lag: s1 on db1047 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 607.73 seconds [13:49:09] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 845.36 seconds [13:56:09] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 224.83 seconds [14:05:09] RECOVERY - MariaDB Slave Lag: s1 on db1047 is OK: OK slave_sql_lag Replication lag: 12.98 seconds [14:26:11] (03CR) 10Krinkle: systemd::service: convert a bunch of modules to it (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/371481 (https://phabricator.wikimedia.org/T173078) (owner: 10Giuseppe Lavagetto) [14:29:18] 10Operations, 10Android-app-feature-Compilations, 10Reading-Infrastructure-Team-Backlog, 10Traffic, and 2 others: Determine how to upload Zim files to Swift infrastructure - https://phabricator.wikimedia.org/T172123#3520604 (10Mholloway) [14:29:20] 10Operations, 10Android-app-feature-Compilations, 10Reading-Infrastructure-Team-Backlog, 10Wikipedia-Android-App-Backlog: Create 'pagecompilation' Swift account(s) (beta + prod) for Readers offline article compilations project - https://phabricator.wikimedia.org/T172735#3520603 (10Mholloway) 05Open>03Re... [14:34:23] (03PS17) 10Rush: tools: job to copytruncate logs in place [puppet] - 10https://gerrit.wikimedia.org/r/326153 (https://phabricator.wikimedia.org/T152235) [14:34:50] (03CR) 10jerkins-bot: [V: 04-1] tools: job to copytruncate logs in place [puppet] - 10https://gerrit.wikimedia.org/r/326153 (https://phabricator.wikimedia.org/T152235) (owner: 10Rush) [14:39:55] (03PS18) 10Rush: tools: job to copytruncate logs in place [puppet] - 10https://gerrit.wikimedia.org/r/326153 (https://phabricator.wikimedia.org/T152235) [14:40:22] (03CR) 10jerkins-bot: [V: 04-1] tools: job to copytruncate logs in place [puppet] - 10https://gerrit.wikimedia.org/r/326153 (https://phabricator.wikimedia.org/T152235) (owner: 10Rush) [14:42:34] (03PS19) 10Rush: tools: job to copytruncate logs in place [puppet] - 10https://gerrit.wikimedia.org/r/326153 (https://phabricator.wikimedia.org/T152235) [14:46:24] (03PS3) 10Giuseppe Lavagetto: systemd::service: convert a bunch of modules to it [puppet] - 10https://gerrit.wikimedia.org/r/371481 (https://phabricator.wikimedia.org/T173078) [14:46:26] (03PS2) 10Giuseppe Lavagetto: prometheus: convert to systemd::service where needed [puppet] - 10https://gerrit.wikimedia.org/r/371482 (https://phabricator.wikimedia.org/T173078) [14:46:28] (03PS1) 10Giuseppe Lavagetto: varnish: convert to systemd::service [puppet] - 10https://gerrit.wikimedia.org/r/371617 (https://phabricator.wikimedia.org/T173078) [14:46:30] (03PS1) 10Giuseppe Lavagetto: thumbor,swift: convert to systemd::service and systemd::unit [puppet] - 10https://gerrit.wikimedia.org/r/371618 (https://phabricator.wikimedia.org/T173078) [14:46:32] (03PS1) 10Giuseppe Lavagetto: base::service_unit: convert services to systemd::service [puppet] - 10https://gerrit.wikimedia.org/r/371619 (https://phabricator.wikimedia.org/T173078) [15:06:30] PROBLEM - CirrusSearch eqiad 95th percentile latency on graphite1001 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [1000.0] [15:07:06] thats just 1017 again ... [15:13:58] (03PS20) 10Rush: tools: job to copytruncate logs in place [puppet] - 10https://gerrit.wikimedia.org/r/326153 (https://phabricator.wikimedia.org/T152235) [15:16:39] RECOVERY - CirrusSearch eqiad 95th percentile latency on graphite1001 is OK: OK: Less than 20.00% above the threshold [500.0] [15:23:31] mw2256 is different this time, load avg to the roof.. I am able to see com2 but not ssh or log in as root [15:25:03] !log powercycle mw2256 (able to use com2 but not to login as root, regular ssh hanging) - T163346 [15:25:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:25:17] T163346: mw2256 - hardware issue - https://phabricator.wikimedia.org/T163346 [15:25:37] (03CR) 10Daniel Kinzler: [C: 031] "yes, please" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371067 (owner: 10Thiemo Mättig (WMDE)) [15:26:28] (03CR) 10Daniel Kinzler: [C: 031] "yes please" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371069 (owner: 10Thiemo Mättig (WMDE)) [15:26:49] RECOVERY - dhclient process on mw2256 is OK: PROCS OK: 0 processes with command name dhclient [15:26:49] RECOVERY - configured eth on mw2256 is OK: OK - interfaces up [15:26:50] RECOVERY - Check whether ferm is active by checking the default input chain on mw2256 is OK: OK ferm input default policy is set [15:27:00] RECOVERY - MD RAID on mw2256 is OK: OK: Active: 4, Working: 4, Failed: 0, Spare: 0 [15:27:09] RECOVERY - nutcracker process on mw2256 is OK: PROCS OK: 1 process with UID = 114 (nutcracker), command name nutcracker [15:27:09] RECOVERY - Check size of conntrack table on mw2256 is OK: OK: nf_conntrack is 0 % full [15:27:09] RECOVERY - DPKG on mw2256 is OK: All packages OK [15:27:09] RECOVERY - SSH on mw2256 is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u3 (protocol 2.0) [15:27:19] RECOVERY - Disk space on mw2256 is OK: DISK OK [15:27:29] RECOVERY - Check systemd state on mw2256 is OK: OK - running: The system is fully operational [15:27:29] RECOVERY - HHVM processes on mw2256 is OK: PROCS OK: 6 processes with command name hhvm [15:27:29] RECOVERY - salt-minion processes on mw2256 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [15:27:39] RECOVERY - nutcracker port on mw2256 is OK: TCP OK - 0.000 second response time on 127.0.0.1 port 11212 [15:29:25] ah some traces in syslog, gooood [15:35:10] 10Operations, 10ops-codfw: mw2256 - hardware issue - https://phabricator.wikimedia.org/T163346#3520737 (10elukey) This time the host showed a sudden increase in load average and I can see this in the syslog at around the same time: {F9045686} ``` Aug 11 21:21:38 mw2256 kernel: [109231.690343] BUG: stack guar... [15:37:30] all right will investigate more on monday :) [15:37:40] I think the stack trace is just fallout of the hardware error [15:38:02] hello moritzm! [15:38:32] hi :-) [15:38:42] let's keep it depooled and work Papaul thriugh it with Dell [15:39:07] it is pooled now, I can set it inactive [15:39:34] this will likely crash again over the weekend, let's rather set it to inactive again [15:39:39] RECOVERY - Check the NTP synchronisation status of timesyncd on mw2256 is OK: OK: synced at Sat 2017-08-12 15:39:33 UTC. [15:40:10] inactive :) [15:40:57] I am wondering if firmware/bios upgrade + thermal paste changed anything [15:41:27] like the fact that now it doesn't completely freeze but it shows some trace of errors [15:41:28] it changed the thermal paste :P [15:41:41] probably, maybe the SOS will show the error to Dell [15:44:49] all right enjoy your saturday people! ttl :) [15:47:33] right, see you on Monday [15:52:39] RECOVERY - IPMI Temperature on mw2256 is OK: Sensor Type(s) Temperature Status: OK [16:09:52] !log Deleted some bogus user languages from commonswiki.user_properties [16:10:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:32:49] PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1950 bytes in 0.104 second response time [16:37:49] RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1950 bytes in 0.121 second response time [16:48:45] (03PS1) 10Jforrester: Enable responsive reference columns on enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371630 (https://phabricator.wikimedia.org/T173176) [17:20:40] PROBLEM - MariaDB Slave Lag: s4 on db2037 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 306.79 seconds [17:21:39] PROBLEM - MariaDB Slave Lag: s4 on db2044 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 324.99 seconds [17:29:33] (03CR) 10Rush: [C: 032] tools: job to copytruncate logs in place [puppet] - 10https://gerrit.wikimedia.org/r/326153 (https://phabricator.wikimedia.org/T152235) (owner: 10Rush) [17:31:40] RECOVERY - MariaDB Slave Lag: s4 on db2044 is OK: OK slave_sql_lag Replication lag: 0.02 seconds [17:32:49] RECOVERY - MariaDB Slave Lag: s4 on db2037 is OK: OK slave_sql_lag Replication lag: 0.03 seconds [17:51:09] (03PS1) 10Rush: tools: followup to 326153 [puppet] - 10https://gerrit.wikimedia.org/r/371633 (https://phabricator.wikimedia.org/T152235) [17:51:26] (03Abandoned) 10Rush: tool: convert HBA source host mechanism to static [puppet] - 10https://gerrit.wikimedia.org/r/334203 (https://phabricator.wikimedia.org/T156168) (owner: 10Rush) [17:51:39] (03PS2) 10Rush: tools: followup to 326153 [puppet] - 10https://gerrit.wikimedia.org/r/371633 (https://phabricator.wikimedia.org/T152235) [17:52:23] (03CR) 10Rush: [C: 032] tools: followup to 326153 [puppet] - 10https://gerrit.wikimedia.org/r/371633 (https://phabricator.wikimedia.org/T152235) (owner: 10Rush) [17:53:49] PROBLEM - puppet last run on labstore1005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:54:49] RECOVERY - puppet last run on labstore1005 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [17:54:59] PROBLEM - MariaDB Slave Lag: s4 on db2044 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 306.63 seconds [17:55:02] PROBLEM - MariaDB Slave Lag: s4 on db2037 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 300.23 seconds [18:02:00] RECOVERY - MariaDB Slave Lag: s4 on db2044 is OK: OK slave_sql_lag Replication lag: 0.00 seconds [18:02:09] RECOVERY - MariaDB Slave Lag: s4 on db2037 is OK: OK slave_sql_lag Replication lag: 0.00 seconds [18:10:17] (03PS1) 10Rush: tools: remote logcleanup active cron for now [puppet] - 10https://gerrit.wikimedia.org/r/371634 (https://phabricator.wikimedia.org/T152235) [18:11:37] (03PS2) 10Rush: tools: remote logcleanup active cron for now [puppet] - 10https://gerrit.wikimedia.org/r/371634 (https://phabricator.wikimedia.org/T152235) [18:14:13] (03CR) 10Rush: [C: 032] tools: remote logcleanup active cron for now [puppet] - 10https://gerrit.wikimedia.org/r/371634 (https://phabricator.wikimedia.org/T152235) (owner: 10Rush) [18:14:15] (03PS1) 10Mark Bergsma: Instrument the Twisted reactor with Prometheus metrics [debs/pybal] - 10https://gerrit.wikimedia.org/r/371636 [18:16:53] (03PS2) 10Mark Bergsma: Instrument the Twisted reactor with Prometheus metrics [debs/pybal] - 10https://gerrit.wikimedia.org/r/371636 (https://phabricator.wikimedia.org/T171710) [18:23:39] 10Operations, 10Wiki-Loves-Monuments (2017): Import Wiki Loves Monuments photos from Flickr to Commons - https://phabricator.wikimedia.org/T173056#3521172 (10Multichill) @fgiunchedi what do you think are the risks? Number of incoming images maybe? Haven't seen any issues in that area for a long time. Maybe som... [18:55:12] 10Operations, 10MediaWiki-extensions-Scribunto: Build and push a new hhvm-luasandbox package - https://phabricator.wikimedia.org/T171166#3521217 (10eranroz) p:05High>03Unbreak! [19:31:02] (03CR) 10Ema: "Looks good, just a few minor comments." (033 comments) [debs/pybal] - 10https://gerrit.wikimedia.org/r/371636 (https://phabricator.wikimedia.org/T171710) (owner: 10Mark Bergsma) [19:34:30] (03CR) 10Ema: [C: 04-1] "> Looks good, just a few minor comments." [debs/pybal] - 10https://gerrit.wikimedia.org/r/371636 (https://phabricator.wikimedia.org/T171710) (owner: 10Mark Bergsma) [19:36:45] (03PS1) 10Filippo Giunchedi: udev: new module [puppet] - 10https://gerrit.wikimedia.org/r/371642 [19:38:05] (03Abandoned) 10Filippo Giunchedi: profile: fix udev reload dependency for swift::storage::labs [puppet] - 10https://gerrit.wikimedia.org/r/371582 (owner: 10Filippo Giunchedi) [19:41:04] (03CR) 10Filippo Giunchedi: "PCC https://puppet-compiler.wmflabs.org/compiler02/7420/ms-be1030.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/371642 (owner: 10Filippo Giunchedi) [19:50:49] PROBLEM - Disk space on graphite1003 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [19:52:38] bblack, are you at wikimania? [19:53:31] (03PS1) 10Mobrovac: Add /home/ files for mobrovac [puppet] - 10https://gerrit.wikimedia.org/r/371644 [19:54:57] bblack, I wrote https://gerrit.wikimedia.org/r/#/c/317450/ a while ago and am thinking about what might be missing [19:56:02] (03PS2) 10Mobrovac: Add /home/ files for mobrovac [puppet] - 10https://gerrit.wikimedia.org/r/371644 [19:57:17] (03PS3) 10Mark Bergsma: Instrument the Twisted reactor with Prometheus metrics [debs/pybal] - 10https://gerrit.wikimedia.org/r/371636 (https://phabricator.wikimedia.org/T171710) [20:00:52] !log krinkle@tin Synchronized php-1.30.0-wmf.13/includes/jobqueue/JobQueueGroup.php: T171371 - Log job pushes to bogus wikis (duration: 00m 53s) [20:01:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:01:06] T171371: Investigate 30x increase in Jobrunner errors - https://phabricator.wikimedia.org/T171371 [20:03:41] (03PS4) 10Mark Bergsma: Instrument the Twisted reactor with Prometheus metrics [debs/pybal] - 10https://gerrit.wikimedia.org/r/371636 (https://phabricator.wikimedia.org/T171710) [20:03:55] ema: ^ try that one? [20:04:02] i can't easily test epoll atm [20:05:13] (03PS3) 10Mobrovac: Add /home/ files for mobrovac [puppet] - 10https://gerrit.wikimedia.org/r/371644 [20:06:27] mark: trying [20:06:43] (03PS4) 10Mobrovac: Add /home/ files for mobrovac [puppet] - 10https://gerrit.wikimedia.org/r/371644 [20:06:45] mark: it works! [20:06:48] hm [20:06:51] but that 4th argument [20:06:54] I can add it to labels I guess [20:06:57] event [20:06:59] mark: what's the "method" label? [20:07:00] gimme a sec [20:07:04] doRead or doWrite [20:07:12] like method=3 [20:07:18] pybal_reactor_do_read_or_write_duration_count{method="3",selectable="Client"} 6.0 [20:07:21] pybal_reactor_do_read_or_write_duration_count{method="3",selectable="Client"} 6.0 [20:07:40] (03CR) 10Ema: [C: 032] Add /home/ files for mobrovac [puppet] - 10https://gerrit.wikimedia.org/r/371644 (owner: 10Mobrovac) [20:08:32] (03PS5) 10Mark Bergsma: Instrument the Twisted reactor with Prometheus metrics [debs/pybal] - 10https://gerrit.wikimedia.org/r/371636 (https://phabricator.wikimedia.org/T171710) [20:08:39] ^ that one adds 'event' [20:08:52] ema: oh, thats different with Select [20:08:58] epoll specific I guess [20:09:39] PROBLEM - Upload HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] [20:09:59] PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0] [20:10:09] 10Operations, 10MediaWiki-extensions-Scribunto: Build and push a new hhvm-luasandbox package - https://phabricator.wikimedia.org/T171166#3521370 (10MoritzMuehlenhoff) For the current status I'm not sure. In T171267 @tstarling mentioned tests in deployment-prep. If those were successful, we proceed with the act... [20:13:33] hm [20:15:34] (03PS1) 10Mobrovac: home/mobrovac: Add dir for temp vim files [puppet] - 10https://gerrit.wikimedia.org/r/371647 [20:15:39] RECOVERY - Upload HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [20:16:38] (03CR) 10Ema: [C: 032] home/mobrovac: Add dir for temp vim files [puppet] - 10https://gerrit.wikimedia.org/r/371647 (owner: 10Mobrovac) [20:16:59] RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [20:19:06] !ping [20:19:14] !ping is pong [20:19:14] Key was added [20:23:29] RECOVERY - puppet last run on restbase1014 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [20:25:25] ema: i'm also thinking we should only instrument the reactor on demand [20:25:29] perhaps a cli switch? [20:25:33] it's extra overhead [20:25:39] on every read/write :P [21:24:20] PROBLEM - Disk space on graphite1003 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [21:53:30] PROBLEM - Disk space on graphite1003 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [22:05:14] (03PS1) 10Greg Grossmeier: Phabricator: Override the frog token's label [puppet] - 10https://gerrit.wikimedia.org/r/371660 (https://phabricator.wikimedia.org/T173208) [22:06:16] (03CR) 10Awight: [C: 031] "<3" [puppet] - 10https://gerrit.wikimedia.org/r/371660 (https://phabricator.wikimedia.org/T173208) (owner: 10Greg Grossmeier) [22:13:40] 10Operations, 10Ops-Access-Requests: Make @daniel a MediaWiki deployer - https://phabricator.wikimedia.org/T173230#3521683 (10Legoktm) [22:13:56] 10Operations, 10Ops-Access-Requests, 10Release-Engineering-Team: Make @daniel a MediaWiki deployer - https://phabricator.wikimedia.org/T173230#3521696 (10Legoktm) [22:15:57] 10Operations, 10Ops-Access-Requests, 10Release-Engineering-Team: Make @daniel a MediaWiki deployer - https://phabricator.wikimedia.org/T173230#3521683 (10hoo) +1 [22:16:08] 10Operations, 10Ops-Access-Requests, 10Release-Engineering-Team: Make @daniel a MediaWiki deployer - https://phabricator.wikimedia.org/T173230#3521705 (10daniel) Would be useful to have the permissions, yes. Even though I really don't want to be responsible for the Wikidata build ;) [22:18:53] 10Operations, 10Ops-Access-Requests, 10Release-Engineering-Team: Make @daniel a MediaWiki deployer - https://phabricator.wikimedia.org/T173230#3521683 (10Reedy) Guess @daniel has already signed NDA's etc Should just be swapping daniel restricted -> deployment [22:18:56] 10Operations, 10Ops-Access-Requests, 10Release-Engineering-Team: Make @daniel a MediaWiki deployer - https://phabricator.wikimedia.org/T173230#3521683 (10greg) But it's Saturday! Seriously though, deployer like "I know the basics and can push out a change I need by myself without help from releng/someone wh... [22:20:01] (03Draft2) 10Reedy: Make daniel a deployer [puppet] - 10https://gerrit.wikimedia.org/r/371661 (https://phabricator.wikimedia.org/T173230) [22:20:32] 10Operations, 10Ops-Access-Requests, 10Release-Engineering-Team: Make @daniel a MediaWiki deployer - https://phabricator.wikimedia.org/T173230#3521711 (10Reedy) https://gerrit.wikimedia.org/r/#/c/371661/ [22:20:40] PROBLEM - Disk space on graphite1003 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [22:45:22] PROBLEM - Disk space on graphite1003 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [22:57:10] 10Operations, 10Analytics, 10Analytics-Wikistats, 10Wikidata, and 6 others: Create Wikiversity Hindi - https://phabricator.wikimedia.org/T168765#3521735 (10Urbanecm) 05Open>03Resolved Wiki itself is created. CS isn't a blocker. [23:49:10] PROBLEM - Disk space on graphite1003 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.