[00:01:14] ok, i am now, i was still waiting for my last attempt [00:01:30] ok thanks [00:02:39] shouldnt that have --slave though [00:03:25] ah right yeh [00:05:04] nothing.. [00:08:16] RECOVERY - puppet last run on labtestweb2001 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [00:09:45] paladox: /var/lib/gerrit2/review_site/bin/gerrit.war vs /var/lib/gerrit2/gerrit.war -> /srv/deployment/gerrit/gerrit/gerrit.war [00:09:57] /var/lib/gerrit2/gerrit.war is the newer one [00:10:04] /var/lib/gerrit2/review_site/bin/gerrit.war is the correct one [00:10:18] your run init on /var/lib/gerrit2/gerrit.war which upgrades /var/lib/gerrit2/review_site/bin/gerrit.war [00:10:39] paladox: the one you call correct hasnt been changed since Jun 21 [00:10:54] i think so [00:11:02] the other one one Sep 18 [00:11:12] and it's a link into /srv/deployment [00:11:18] yeh, that would be a scap [00:11:18] because scap [00:11:19] when we did scap [00:11:28] so.. how can the non-scap one be the correct one [00:11:49] because the scap one will cause prod to break [00:11:53] when we do a scap deploy [00:12:04] .. [00:12:06] which is why it is safer to use review_site/bin/gerrrit.war [00:12:17] as scap wont change that one [00:12:27] though maybe do a init? [00:12:31] ? [00:12:39] sudo su gerrit2 [00:12:40] java -jar gerrit.war init -d review_site [00:12:49] cd /var/lib/gerrit2 [00:12:50] java -jar gerrit.war init -d review_site [00:13:11] so you are now saying the other one is correct [00:13:22] the one that links to /srv/deployment [00:14:07] yep maybe, though we carn't rely on /var/lib/gerrit2/gerrit.war since it will cause an outage if we do a scap deploy, so when we do init, it will copy it into review_site/bin/gerrit.war [00:14:57] oh, ok, trying that [00:16:22] ok thanks [00:16:31] Upgrade review_site/bin/gerrit.war [Y/n]? [00:16:34] that one you mean, right [00:16:38] among all the other quesitons [00:16:55] yeh [00:17:17] keep pressing enter, apart from where you get to the plugins [00:17:18] put n for plugins [00:17:19] as scap is proving them [00:18:08] yea, N was default for all of the "install plugin" quesitons [00:18:22] guess what.. at the very end of that whole dialog... Exception in thread "main" java.io.IOException: Permission denied [00:18:47] aha [00:19:00] hmm [00:19:22] does it say which file has permission problems? [00:19:33] no :p [00:19:51] hmm [00:20:04] eh, it tries to create the lockfile [00:20:08] and can't [00:20:25] does it try to write it to deployment dir.. i bet that's it [00:20:26] aha [00:20:34] it's the index [00:20:42] it's saying gerrit needs to be stopped [00:20:49] oh wait [00:20:51] lockfile [00:20:56] index/? [00:22:07] ls -la /var/lib/gerrit2/review_site/cache/ [00:22:09] index is also writable by gerrit2 [00:22:13] what permissions are in ^^ [00:22:24] all owned by gerrit2:gerrti2 [00:22:40] ok [00:37:16] 10Operations, 10Gerrit, 10Release-Engineering-Team: Gerrit is failing to start on gerrit2001 - https://phabricator.wikimedia.org/T176532#3628916 (10Paladox) [00:39:09] 10Operations, 10Gerrit, 10Release-Engineering-Team: Gerrit is failing to start on gerrit2001 - https://phabricator.wikimedia.org/T176532#3628931 (10Paladox) [00:39:44] 10Operations, 10Gerrit, 10Release-Engineering-Team: Gerrit is failing to start on gerrit2001 - https://phabricator.wikimedia.org/T176532#3628932 (10Dzahn) Yea, we already tried a whole bunch of things but couldn't get gerrit service to properly start. cobalt (prod gerrit master) is untouched, this is just th... [00:41:36] PROBLEM - puppet last run on analytics1046 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [00:50:04] 10Operations, 10Gerrit, 10Release-Engineering-Team: Gerrit is failing to start on gerrit2001 - https://phabricator.wikimedia.org/T176532#3628936 (10Dzahn) Correction: I can get gerrit.service itself back to state running and have no errors.. just we never get the gerrit-ssh service on port 29418. Now a ques... [00:51:23] 10Operations, 10Gerrit, 10Release-Engineering-Team: Gerrit is failing to start gerrit-ssh on gerrit2001 - https://phabricator.wikimedia.org/T176532#3628937 (10Dzahn) [01:09:46] RECOVERY - puppet last run on analytics1046 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [01:14:39] 10Operations, 10Ops-Access-Requests: Requesting access to stat1005 for Slaporte - https://phabricator.wikimedia.org/T176518#3628944 (10RobH) a:05Zoranzoki21>03None [01:23:37] PROBLEM - puppet last run on gerrit2001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 14 minutes ago with 1 failures. Failed resources (up to 3 shown): Service[gerrit] [01:29:17] 10Operations, 10Gerrit, 10Release-Engineering-Team: Gerrit is failing to start gerrit-ssh on gerrit2001 - https://phabricator.wikimedia.org/T176532#3628961 (10Paladox) But it’s not writing to the logs do that is a clue there’s a bigger problem. [02:26:27] RECOVERY - MariaDB Slave SQL: s4 on dbstore1001 is OK: OK slave_sql_state Slave_SQL_Running: No, (no error: intentional) [02:26:27] RECOVERY - MariaDB Slave SQL: s6 on dbstore1001 is OK: OK slave_sql_state Slave_SQL_Running: No, (no error: intentional) [02:26:37] RECOVERY - MariaDB Slave SQL: m3 on dbstore1001 is OK: OK slave_sql_state Slave_SQL_Running: No, (no error: intentional) [02:26:37] RECOVERY - MariaDB Slave IO: s1 on dbstore1001 is OK: OK slave_io_state Slave_IO_Running: No, (no error: intentional) [02:26:37] RECOVERY - MariaDB Slave IO: s2 on dbstore1001 is OK: OK slave_io_state Slave_IO_Running: No, (no error: intentional) [02:26:47] RECOVERY - MariaDB Slave IO: s5 on dbstore1001 is OK: OK slave_io_state Slave_IO_Running: No, (no error: intentional) [02:26:47] RECOVERY - MariaDB Slave IO: s6 on dbstore1001 is OK: OK slave_io_state Slave_IO_Running: No, (no error: intentional) [02:26:56] RECOVERY - MariaDB Slave IO: s7 on dbstore1001 is OK: OK slave_io_state Slave_IO_Running: No, (no error: intentional) [02:26:57] RECOVERY - MariaDB Slave SQL: x1 on dbstore1001 is OK: OK slave_sql_state Slave_SQL_Running: No, (no error: intentional) [02:26:57] RECOVERY - MariaDB Slave IO: m3 on dbstore1001 is OK: OK slave_io_state Slave_IO_Running: No, (no error: intentional) [02:26:57] RECOVERY - MariaDB Slave SQL: s2 on dbstore1001 is OK: OK slave_sql_state Slave_SQL_Running: No, (no error: intentional) [02:26:57] RECOVERY - MariaDB Slave SQL: s5 on dbstore1001 is OK: OK slave_sql_state Slave_SQL_Running: No, (no error: intentional) [02:26:57] RECOVERY - MariaDB Slave SQL: m2 on dbstore1001 is OK: OK slave_sql_state not a slave [02:26:57] RECOVERY - MariaDB Slave IO: s3 on dbstore1001 is OK: OK slave_io_state Slave_IO_Running: No, (no error: intentional) [02:26:58] RECOVERY - MariaDB Slave IO: m2 on dbstore1001 is OK: OK slave_io_state not a slave [02:26:58] RECOVERY - MariaDB Slave SQL: s7 on dbstore1001 is OK: OK slave_sql_state Slave_SQL_Running: No, (no error: intentional) [02:26:59] RECOVERY - MariaDB Slave SQL: s3 on dbstore1001 is OK: OK slave_sql_state Slave_SQL_Running: No, (no error: intentional) [02:26:59] RECOVERY - MariaDB Slave IO: s4 on dbstore1001 is OK: OK slave_io_state Slave_IO_Running: No, (no error: intentional) [02:27:00] RECOVERY - MariaDB Slave SQL: s1 on dbstore1001 is OK: OK slave_sql_state Slave_SQL_Running: No, (no error: intentional) [02:27:16] RECOVERY - MariaDB Slave IO: x1 on dbstore1001 is OK: OK slave_io_state Slave_IO_Running: No, (no error: intentional) [02:27:16] RECOVERY - MariaDB Slave Lag: m2 on dbstore1001 is OK: OK slave_sql_lag not a slave [02:35:56] PROBLEM - MegaRAID on labsdb1001 is CRITICAL: CRITICAL: 1 LD(s) must have write cache policy WriteBack, currently using: WriteThrough [02:39:19] 10Operations, 10Gerrit, 10Release-Engineering-Team: Gerrit is failing to start gerrit-ssh on gerrit2001 - https://phabricator.wikimedia.org/T176532#3628968 (10Dzahn) and..after a while it dies again just by itself.. [06:46:08] (03PS3) 10Ladsgroup: Add config for amwikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/378400 (https://phabricator.wikimedia.org/T176042) [06:46:56] (03Abandoned) 10Ladsgroup: Add amwikimedia to s3 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/378401 (https://phabricator.wikimedia.org/T176042) (owner: 10Ladsgroup) [07:08:27] PROBLEM - MariaDB Slave Lag: s2 on dbstore1001 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 180000.84 seconds [07:38:56] RECOVERY - MariaDB Slave Lag: x1 on dbstore1001 is OK: OK slave_sql_lag Replication lag: 86413.12 seconds [08:01:37] PROBLEM - very high load average likely xfs on ms-be2020 is CRITICAL: CRITICAL - load average: 247.42, 113.67, 52.04 [08:03:36] PROBLEM - Docker registry HTTPS interface on darmstadtium is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:04:40] 10Operations, 10Ops-Access-Requests, 10Patch-For-Review: Requesting access to stat1005 for Slaporte - https://phabricator.wikimedia.org/T176518#3629031 (10Zoranzoki21) https://gerrit.wikimedia.org/r/#/c/379851/ [08:05:27] PROBLEM - Disk space on ms-be2020 is CRITICAL: DISK CRITICAL - /srv/swift-storage/sdk1 is not accessible: Input/output error [08:05:56] PROBLEM - SSH on ms-be2020 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:06:26] RECOVERY - Docker registry HTTPS interface on darmstadtium is OK: HTTP OK: HTTP/1.1 200 OK - 2461 bytes in 0.396 second response time [08:08:46] RECOVERY - very high load average likely xfs on ms-be2020 is OK: OK - load average: 14.26, 72.43, 63.01 [08:14:36] PROBLEM - DPKG on ms-be2017 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:14:37] PROBLEM - Disk space on ms-be2017 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:14:37] PROBLEM - very high load average likely xfs on ms-be2017 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:14:47] PROBLEM - swift-object-replicator on ms-be2017 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:14:47] PROBLEM - swift-account-server on ms-be2017 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:14:57] PROBLEM - swift-object-auditor on ms-be2017 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:14:57] PROBLEM - swift-account-reaper on ms-be2017 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:14:57] PROBLEM - swift-container-server on ms-be2017 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:14:57] PROBLEM - Check the NTP synchronisation status of timesyncd on ms-be2017 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:15:06] PROBLEM - swift-container-replicator on ms-be2017 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:15:06] PROBLEM - Check systemd state on ms-be2017 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:15:07] PROBLEM - Check whether ferm is active by checking the default input chain on ms-be2017 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:15:16] PROBLEM - puppet last run on ms-be2017 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:15:16] PROBLEM - swift-object-updater on ms-be2017 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:15:16] PROBLEM - swift-account-replicator on ms-be2017 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:15:17] PROBLEM - swift-container-updater on ms-be2017 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:15:36] PROBLEM - swift-object-server on ms-be2017 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:15:36] PROBLEM - swift-account-auditor on ms-be2017 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:16:06] PROBLEM - swift-container-auditor on ms-be2017 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:16:16] RECOVERY - MariaDB Slave Lag: m3 on dbstore1001 is OK: OK slave_sql_lag Replication lag: 89500.86 seconds [08:16:36] PROBLEM - dhclient process on ms-be2017 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:17:26] PROBLEM - MD RAID on ms-be2017 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:17:26] PROBLEM - configured eth on ms-be2017 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:17:36] PROBLEM - DPKG on ms-be2017 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [08:17:37] RECOVERY - very high load average likely xfs on ms-be2017 is OK: OK - load average: 33.84, 27.04, 19.25 [08:17:37] RECOVERY - Disk space on ms-be2017 is OK: DISK OK [08:17:46] RECOVERY - swift-account-server on ms-be2017 is OK: PROCS OK: 41 processes with regex args ^/usr/bin/python /usr/bin/swift-account-server [08:17:46] RECOVERY - swift-object-replicator on ms-be2017 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [08:17:56] RECOVERY - swift-account-reaper on ms-be2017 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-reaper [08:17:56] RECOVERY - swift-object-auditor on ms-be2017 is OK: PROCS OK: 3 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [08:17:56] RECOVERY - swift-container-server on ms-be2017 is OK: PROCS OK: 41 processes with regex args ^/usr/bin/python /usr/bin/swift-container-server [08:17:56] RECOVERY - swift-container-auditor on ms-be2017 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [08:17:56] RECOVERY - swift-container-replicator on ms-be2017 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-replicator [08:18:06] RECOVERY - Check whether ferm is active by checking the default input chain on ms-be2017 is OK: OK ferm input default policy is set [08:18:06] RECOVERY - Check systemd state on ms-be2017 is OK: OK - running: The system is fully operational [08:18:06] RECOVERY - puppet last run on ms-be2017 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [08:18:16] RECOVERY - swift-account-replicator on ms-be2017 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-replicator [08:18:16] RECOVERY - swift-object-updater on ms-be2017 is OK: PROCS OK: 2 processes with regex args ^/usr/bin/python /usr/bin/swift-object-updater [08:18:16] RECOVERY - swift-container-updater on ms-be2017 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-updater [08:18:16] RECOVERY - configured eth on ms-be2017 is OK: OK - interfaces up [08:18:16] RECOVERY - MD RAID on ms-be2017 is OK: OK: Active: 4, Working: 4, Failed: 0, Spare: 0 [08:18:26] RECOVERY - dhclient process on ms-be2017 is OK: PROCS OK: 0 processes with command name dhclient [08:18:26] RECOVERY - swift-account-auditor on ms-be2017 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-auditor [08:18:26] RECOVERY - swift-object-server on ms-be2017 is OK: PROCS OK: 101 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server [08:18:27] RECOVERY - DPKG on ms-be2017 is OK: All packages OK [08:44:56] RECOVERY - Check the NTP synchronisation status of timesyncd on ms-be2017 is OK: OK: synced at Sat 2017-09-23 08:44:48 UTC. [09:19:27] PROBLEM - Docker registry HTTPS interface on darmstadtium is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:20:26] RECOVERY - Docker registry HTTPS interface on darmstadtium is OK: HTTP OK: HTTP/1.1 200 OK - 2461 bytes in 0.384 second response time [09:24:06] RECOVERY - MariaDB Slave Lag: s6 on dbstore1001 is OK: OK slave_sql_lag Replication lag: 89678.45 seconds [09:28:36] PROBLEM - Docker registry HTTPS interface on darmstadtium is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:29:27] RECOVERY - Docker registry HTTPS interface on darmstadtium is OK: HTTP OK: HTTP/1.1 200 OK - 2461 bytes in 0.376 second response time [09:41:26] PROBLEM - DPKG on ms-be2031 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [09:41:47] PROBLEM - Check size of conntrack table on ms-be2031 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [09:41:56] RECOVERY - MariaDB Slave Lag: s4 on dbstore1001 is OK: OK slave_sql_lag Replication lag: 89892.08 seconds [09:41:57] PROBLEM - very high load average likely xfs on ms-be2031 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [09:42:16] RECOVERY - DPKG on ms-be2031 is OK: All packages OK [09:42:46] RECOVERY - Check size of conntrack table on ms-be2031 is OK: OK: nf_conntrack is 0 % full [09:42:49] RECOVERY - very high load average likely xfs on ms-be2031 is OK: OK - load average: 31.91, 32.44, 30.31 [09:45:56] PROBLEM - Docker registry HTTPS interface on darmstadtium is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:47:46] RECOVERY - Docker registry HTTPS interface on darmstadtium is OK: HTTP OK: HTTP/1.1 200 OK - 2461 bytes in 0.914 second response time [09:49:26] PROBLEM - Check size of conntrack table on ms-be2039 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [09:49:36] PROBLEM - MD RAID on ms-be2039 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [09:49:36] PROBLEM - Disk space on ms-be2039 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [09:49:46] PROBLEM - HP RAID on ms-be2039 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [09:49:47] PROBLEM - Check systemd state on ms-be2039 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [09:50:06] PROBLEM - very high load average likely xfs on ms-be2039 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [09:50:06] PROBLEM - configured eth on ms-be2039 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [09:50:07] PROBLEM - swift-account-server on ms-be2039 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [09:50:16] PROBLEM - swift-container-replicator on ms-be2039 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [09:50:16] PROBLEM - swift-object-replicator on ms-be2039 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [09:50:17] PROBLEM - puppet last run on ms-be2039 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [09:50:17] PROBLEM - Check whether ferm is active by checking the default input chain on ms-be2039 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [09:50:17] PROBLEM - dhclient process on ms-be2039 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [09:50:32] PROBLEM - swift-account-replicator on ms-be2039 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [09:50:32] PROBLEM - swift-object-auditor on ms-be2039 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [09:50:32] PROBLEM - swift-account-auditor on ms-be2039 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [09:51:02] RECOVERY - configured eth on ms-be2039 is OK: OK - interfaces up [09:51:02] RECOVERY - very high load average likely xfs on ms-be2039 is OK: OK - load average: 32.92, 27.23, 22.76 [09:51:02] RECOVERY - swift-account-server on ms-be2039 is OK: PROCS OK: 49 processes with regex args ^/usr/bin/python /usr/bin/swift-account-server [09:51:03] RECOVERY - swift-container-replicator on ms-be2039 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-replicator [09:51:12] RECOVERY - swift-object-replicator on ms-be2039 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [09:51:12] RECOVERY - Check whether ferm is active by checking the default input chain on ms-be2039 is OK: OK ferm input default policy is set [09:51:12] RECOVERY - puppet last run on ms-be2039 is OK: OK: Puppet is currently enabled, last run 13 minutes ago with 0 failures [09:51:12] RECOVERY - dhclient process on ms-be2039 is OK: PROCS OK: 0 processes with command name dhclient [09:51:12] RECOVERY - swift-account-replicator on ms-be2039 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-replicator [09:51:13] RECOVERY - swift-object-auditor on ms-be2039 is OK: PROCS OK: 3 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [09:51:22] RECOVERY - Check size of conntrack table on ms-be2039 is OK: OK: nf_conntrack is 0 % full [09:51:23] RECOVERY - swift-account-auditor on ms-be2039 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-auditor [09:51:32] RECOVERY - Disk space on ms-be2039 is OK: DISK OK [09:51:43] RECOVERY - Check systemd state on ms-be2039 is OK: OK - running: The system is fully operational [09:52:32] RECOVERY - MD RAID on ms-be2039 is OK: OK: Active: 4, Working: 4, Failed: 0, Spare: 0 [10:00:03] RECOVERY - HP RAID on ms-be2039 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4 - Controller: OK - Battery/Capacitor: OK [10:02:13] 10Operations, 10Traffic, 10Wikidata, 10wikiba.se, 10Wikidata-Sprint-2016-11-08: [Task] move wikiba.se webhosting to wikimedia misc-cluster - https://phabricator.wikimedia.org/T99531#3629157 (10Lydia_Pintscher) That sounds great! I thought @Ladsgroup had already requested the repository. I'll let him chim... [10:07:12] PROBLEM - Docker registry HTTPS interface on darmstadtium is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:08:01] 10Operations, 10Traffic, 10Wikidata, 10wikiba.se, 10Wikidata-Sprint-2016-11-08: [Task] move wikiba.se webhosting to wikimedia misc-cluster - https://phabricator.wikimedia.org/T99531#3629158 (10Ladsgroup) We moved the repo for the source code from github to gerrit ([[https://github.com/wikimedia/wikiba.se... [10:09:02] RECOVERY - Docker registry HTTPS interface on darmstadtium is OK: HTTP OK: HTTP/1.1 200 OK - 2461 bytes in 0.362 second response time [10:11:10] 10Operations, 10Traffic, 10Wikidata, 10wikiba.se, 10Wikidata-Sprint-2016-11-08: [Task] move wikiba.se webhosting to wikimedia misc-cluster - https://phabricator.wikimedia.org/T99531#3629159 (10Lydia_Pintscher) Ah gotcha. Makes sense. [10:32:37] PROBLEM - swift-container-auditor on ms-be2017 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [10:32:47] PROBLEM - puppet last run on ms-be2017 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [10:32:47] PROBLEM - configured eth on ms-be2017 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [10:32:47] PROBLEM - MD RAID on ms-be2017 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [10:32:57] PROBLEM - swift-account-auditor on ms-be2017 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [10:32:57] PROBLEM - DPKG on ms-be2017 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [10:32:57] PROBLEM - swift-object-server on ms-be2017 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [10:32:57] PROBLEM - dhclient process on ms-be2017 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [10:32:57] PROBLEM - Disk space on ms-be2017 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [10:33:07] PROBLEM - swift-object-replicator on ms-be2017 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [10:33:07] PROBLEM - swift-account-server on ms-be2017 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [10:33:28] PROBLEM - Check systemd state on ms-be2017 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [10:33:28] PROBLEM - Check whether ferm is active by checking the default input chain on ms-be2017 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [10:33:37] PROBLEM - swift-account-reaper on ms-be2017 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [10:33:37] PROBLEM - swift-container-server on ms-be2017 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [10:33:38] PROBLEM - swift-container-updater on ms-be2017 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [10:33:38] PROBLEM - swift-object-updater on ms-be2017 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [10:33:38] PROBLEM - swift-account-replicator on ms-be2017 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [10:34:57] PROBLEM - very high load average likely xfs on ms-be2017 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [10:35:27] RECOVERY - Check whether ferm is active by checking the default input chain on ms-be2017 is OK: OK ferm input default policy is set [10:35:27] RECOVERY - Check systemd state on ms-be2017 is OK: OK - running: The system is fully operational [10:35:27] RECOVERY - swift-container-server on ms-be2017 is OK: PROCS OK: 41 processes with regex args ^/usr/bin/python /usr/bin/swift-container-server [10:35:27] RECOVERY - swift-account-reaper on ms-be2017 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-reaper [10:35:28] RECOVERY - swift-container-auditor on ms-be2017 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [10:35:37] RECOVERY - swift-account-replicator on ms-be2017 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-replicator [10:35:37] RECOVERY - swift-container-updater on ms-be2017 is OK: PROCS OK: 2 processes with regex args ^/usr/bin/python /usr/bin/swift-container-updater [10:35:37] RECOVERY - swift-object-updater on ms-be2017 is OK: PROCS OK: 2 processes with regex args ^/usr/bin/python /usr/bin/swift-object-updater [10:35:37] RECOVERY - configured eth on ms-be2017 is OK: OK - interfaces up [10:35:37] RECOVERY - puppet last run on ms-be2017 is OK: OK: Puppet is currently enabled, last run 16 minutes ago with 0 failures [10:35:38] RECOVERY - MD RAID on ms-be2017 is OK: OK: Active: 4, Working: 4, Failed: 0, Spare: 0 [10:35:47] RECOVERY - DPKG on ms-be2017 is OK: All packages OK [10:35:47] RECOVERY - swift-account-auditor on ms-be2017 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-auditor [10:35:48] RECOVERY - Disk space on ms-be2017 is OK: DISK OK [10:35:48] RECOVERY - dhclient process on ms-be2017 is OK: PROCS OK: 0 processes with command name dhclient [10:35:48] RECOVERY - swift-object-server on ms-be2017 is OK: PROCS OK: 101 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server [10:35:57] RECOVERY - very high load average likely xfs on ms-be2017 is OK: OK - load average: 28.81, 28.79, 22.45 [10:36:07] RECOVERY - swift-object-replicator on ms-be2017 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [10:36:07] RECOVERY - swift-account-server on ms-be2017 is OK: PROCS OK: 41 processes with regex args ^/usr/bin/python /usr/bin/swift-account-server [10:41:58] PROBLEM - HP RAID on ms-be2017 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [10:43:48] PROBLEM - puppet last run on ms-be2017 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [10:43:57] PROBLEM - MD RAID on ms-be2017 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [10:45:37] PROBLEM - Check systemd state on ms-be2017 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [10:45:37] PROBLEM - Check whether ferm is active by checking the default input chain on ms-be2017 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [10:45:38] PROBLEM - swift-container-server on ms-be2017 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [10:45:38] PROBLEM - swift-account-reaper on ms-be2017 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [10:45:47] PROBLEM - swift-container-auditor on ms-be2017 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [10:45:48] PROBLEM - swift-container-updater on ms-be2017 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [10:46:07] PROBLEM - DPKG on ms-be2017 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [10:46:07] PROBLEM - swift-account-auditor on ms-be2017 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [10:46:07] PROBLEM - swift-object-server on ms-be2017 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [10:46:08] PROBLEM - very high load average likely xfs on ms-be2017 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [10:46:17] PROBLEM - swift-account-server on ms-be2017 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [10:46:18] PROBLEM - swift-object-replicator on ms-be2017 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [10:46:37] RECOVERY - swift-account-reaper on ms-be2017 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-reaper [10:46:37] RECOVERY - swift-container-auditor on ms-be2017 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [10:46:47] PROBLEM - swift-container-replicator on ms-be2017 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [10:46:47] RECOVERY - swift-container-updater on ms-be2017 is OK: PROCS OK: 2 processes with regex args ^/usr/bin/python /usr/bin/swift-container-updater [10:46:47] RECOVERY - puppet last run on ms-be2017 is OK: OK: Puppet is currently enabled, last run 27 minutes ago with 0 failures [10:46:48] RECOVERY - MD RAID on ms-be2017 is OK: OK: Active: 4, Working: 4, Failed: 0, Spare: 0 [10:46:57] RECOVERY - DPKG on ms-be2017 is OK: All packages OK [10:46:57] RECOVERY - swift-account-auditor on ms-be2017 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-auditor [10:46:57] RECOVERY - swift-object-server on ms-be2017 is OK: PROCS OK: 101 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server [10:46:58] RECOVERY - very high load average likely xfs on ms-be2017 is OK: OK - load average: 41.66, 38.15, 30.11 [10:47:08] RECOVERY - swift-object-replicator on ms-be2017 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [10:47:08] RECOVERY - swift-account-server on ms-be2017 is OK: PROCS OK: 41 processes with regex args ^/usr/bin/python /usr/bin/swift-account-server [10:47:28] RECOVERY - Check whether ferm is active by checking the default input chain on ms-be2017 is OK: OK ferm input default policy is set [10:47:28] RECOVERY - Check systemd state on ms-be2017 is OK: OK - running: The system is fully operational [10:47:37] RECOVERY - swift-container-replicator on ms-be2017 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-replicator [10:47:37] RECOVERY - swift-container-server on ms-be2017 is OK: PROCS OK: 41 processes with regex args ^/usr/bin/python /usr/bin/swift-container-server [10:56:07] PROBLEM - Disk space on ms-be2029 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [10:56:07] PROBLEM - swift-container-updater on ms-be2029 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [10:56:07] PROBLEM - swift-container-server on ms-be2029 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [10:56:07] PROBLEM - swift-object-replicator on ms-be2029 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [10:56:08] PROBLEM - swift-container-replicator on ms-be2029 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [10:56:17] PROBLEM - MD RAID on ms-be2029 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [10:56:17] PROBLEM - configured eth on ms-be2029 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [10:56:17] PROBLEM - swift-account-server on ms-be2029 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [10:56:17] PROBLEM - swift-account-replicator on ms-be2029 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [10:56:57] RECOVERY - Disk space on ms-be2029 is OK: DISK OK [10:56:57] RECOVERY - swift-container-server on ms-be2029 is OK: PROCS OK: 49 processes with regex args ^/usr/bin/python /usr/bin/swift-container-server [10:56:58] RECOVERY - swift-object-replicator on ms-be2029 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [10:56:58] RECOVERY - swift-container-updater on ms-be2029 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-updater [10:57:08] RECOVERY - swift-container-replicator on ms-be2029 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-replicator [10:57:08] RECOVERY - configured eth on ms-be2029 is OK: OK - interfaces up [10:57:08] RECOVERY - MD RAID on ms-be2029 is OK: OK: Active: 4, Working: 4, Failed: 0, Spare: 0 [10:57:08] RECOVERY - swift-account-server on ms-be2029 is OK: PROCS OK: 49 processes with regex args ^/usr/bin/python /usr/bin/swift-account-server [10:57:08] RECOVERY - swift-account-replicator on ms-be2029 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-replicator [11:02:37] RECOVERY - HP RAID on ms-be2017 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4 - Controller: OK - Battery/Capacitor: OK [11:18:17] RECOVERY - MariaDB Slave Lag: s7 on dbstore1001 is OK: OK slave_sql_lag Replication lag: 89968.94 seconds [11:30:14] (03Draft3) 10Zoranzoki21: Enable RemexHTML on several wikis. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379966 (https://phabricator.wikimedia.org/T176150) [11:32:04] (03PS4) 10Zoranzoki21: Enable RemexHTML on several wikis. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379966 (https://phabricator.wikimedia.org/T175971) [11:45:57] RECOVERY - MegaRAID on labsdb1001 is OK: OK: optimal, 1 logical, 2 physical, WriteBack policy [11:59:20] (03CR) 10Zoranzoki21: [C: 031] Initial configuration for hi.wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371109 (https://phabricator.wikimedia.org/T173013) (owner: 10MarcoAurelio) [12:27:08] PROBLEM - swift-object-replicator on ms-be2030 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [12:27:37] PROBLEM - dhclient process on ms-be2030 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [12:27:47] PROBLEM - swift-account-auditor on ms-be2030 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [12:27:47] PROBLEM - swift-object-server on ms-be2030 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [12:27:47] PROBLEM - Check systemd state on ms-be2030 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [12:27:48] PROBLEM - swift-container-auditor on ms-be2030 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [12:27:58] PROBLEM - swift-container-server on ms-be2030 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [12:27:58] PROBLEM - DPKG on ms-be2030 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [12:28:07] PROBLEM - swift-object-updater on ms-be2030 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [12:28:08] PROBLEM - very high load average likely xfs on ms-be2030 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [12:28:08] PROBLEM - swift-container-replicator on ms-be2030 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [12:28:08] PROBLEM - MD RAID on ms-be2030 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [12:28:17] PROBLEM - configured eth on ms-be2030 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [12:28:48] PROBLEM - Disk space on ms-be2030 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [12:28:48] PROBLEM - swift-object-auditor on ms-be2030 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [12:28:57] PROBLEM - Docker registry HTTPS interface on darmstadtium is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:29:38] PROBLEM - swift-account-reaper on ms-be2030 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [12:29:38] PROBLEM - dhclient process on ms-be2030 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [12:29:47] RECOVERY - Docker registry HTTPS interface on darmstadtium is OK: HTTP OK: HTTP/1.1 200 OK - 2461 bytes in 0.385 second response time [12:30:37] PROBLEM - swift-account-server on ms-be2030 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [12:30:37] PROBLEM - swift-account-replicator on ms-be2030 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [12:30:57] PROBLEM - puppet last run on ms-be2030 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [12:30:57] PROBLEM - swift-container-auditor on ms-be2030 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [12:31:38] PROBLEM - swift-container-updater on ms-be2030 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [12:31:38] PROBLEM - Check whether ferm is active by checking the default input chain on ms-be2030 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [12:31:47] PROBLEM - dhclient process on ms-be2030 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [12:31:57] PROBLEM - swift-object-server on ms-be2030 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [12:32:17] PROBLEM - MD RAID on ms-be2030 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [12:32:17] PROBLEM - swift-object-replicator on ms-be2030 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [12:32:38] PROBLEM - swift-account-replicator on ms-be2030 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [12:32:47] PROBLEM - swift-account-server on ms-be2030 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [12:32:57] PROBLEM - Check systemd state on ms-be2030 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [12:32:57] PROBLEM - puppet last run on ms-be2030 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [12:33:38] PROBLEM - Check whether ferm is active by checking the default input chain on ms-be2030 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [12:33:40] PROBLEM - swift-container-updater on ms-be2030 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [12:33:47] PROBLEM - swift-account-reaper on ms-be2030 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [12:33:57] PROBLEM - swift-object-auditor on ms-be2030 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [12:34:38] PROBLEM - SSH on ms-be2030 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:34:47] PROBLEM - swift-account-replicator on ms-be2030 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [12:34:58] PROBLEM - swift-container-auditor on ms-be2030 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [12:35:18] PROBLEM - swift-object-replicator on ms-be2030 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [12:35:37] PROBLEM - HP RAID on ms-be2030 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [12:35:47] PROBLEM - swift-account-server on ms-be2030 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [12:36:07] PROBLEM - puppet last run on ms-be2030 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [12:36:57] PROBLEM - swift-object-server on ms-be2030 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [12:37:08] RECOVERY - very high load average likely xfs on ms-be2030 is OK: OK - load average: 38.02, 51.65, 39.06 [12:37:08] RECOVERY - swift-container-replicator on ms-be2030 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-replicator [12:37:17] RECOVERY - swift-object-replicator on ms-be2030 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [12:37:17] RECOVERY - MD RAID on ms-be2030 is OK: OK: Active: 4, Working: 4, Failed: 0, Spare: 0 [12:37:18] RECOVERY - configured eth on ms-be2030 is OK: OK - interfaces up [12:37:37] RECOVERY - SSH on ms-be2030 is OK: SSH OK - OpenSSH_7.4p1 Debian-10 (protocol 2.0) [12:37:37] RECOVERY - Check whether ferm is active by checking the default input chain on ms-be2030 is OK: OK ferm input default policy is set [12:37:38] RECOVERY - swift-container-updater on ms-be2030 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-updater [12:37:38] RECOVERY - swift-account-reaper on ms-be2030 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-reaper [12:37:38] RECOVERY - swift-account-replicator on ms-be2030 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-replicator [12:37:38] RECOVERY - dhclient process on ms-be2030 is OK: PROCS OK: 0 processes with command name dhclient [12:37:38] RECOVERY - swift-account-server on ms-be2030 is OK: PROCS OK: 49 processes with regex args ^/usr/bin/python /usr/bin/swift-account-server [12:37:48] RECOVERY - Disk space on ms-be2030 is OK: DISK OK [12:37:48] RECOVERY - swift-account-auditor on ms-be2030 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-auditor [12:37:48] RECOVERY - swift-object-auditor on ms-be2030 is OK: PROCS OK: 3 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [12:37:48] RECOVERY - swift-object-server on ms-be2030 is OK: PROCS OK: 101 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server [12:37:48] RECOVERY - Check systemd state on ms-be2030 is OK: OK - running: The system is fully operational [12:37:57] RECOVERY - swift-container-auditor on ms-be2030 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [12:37:59] RECOVERY - puppet last run on ms-be2030 is OK: OK: Puppet is currently enabled, last run 42 minutes ago with 0 failures [12:38:07] RECOVERY - DPKG on ms-be2030 is OK: All packages OK [12:38:07] RECOVERY - swift-container-server on ms-be2030 is OK: PROCS OK: 49 processes with regex args ^/usr/bin/python /usr/bin/swift-container-server [12:38:07] RECOVERY - swift-object-updater on ms-be2030 is OK: PROCS OK: 2 processes with regex args ^/usr/bin/python /usr/bin/swift-object-updater [12:45:57] RECOVERY - HP RAID on ms-be2030 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4 - Controller: OK - Battery/Capacitor: OK [13:34:37] PROBLEM - Docker registry HTTPS interface on darmstadtium is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:36:28] RECOVERY - Docker registry HTTPS interface on darmstadtium is OK: HTTP OK: HTTP/1.1 200 OK - 2461 bytes in 0.385 second response time [14:31:30] (03CR) 10Arlolra: Enable RemexHTML on several wikis. (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379966 (https://phabricator.wikimedia.org/T175971) (owner: 10Zoranzoki21) [14:34:21] (03PS5) 10Zoranzoki21: Enable RemexHTML on several wikis. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379966 (https://phabricator.wikimedia.org/T175971) [14:35:20] (03CR) 10Zoranzoki21: Enable RemexHTML on several wikis. (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379966 (https://phabricator.wikimedia.org/T175971) (owner: 10Zoranzoki21) [14:53:07] RECOVERY - MariaDB Slave Lag: s1 on dbstore1001 is OK: OK slave_sql_lag Replication lag: 89948.37 seconds [15:25:55] (03PS5) 10ArielGlenn: move fetches of various datasets to dump module from datasets module [puppet] - 10https://gerrit.wikimedia.org/r/379790 (https://phabricator.wikimedia.org/T175528) [15:26:36] (03CR) 10ArielGlenn: [C: 032] move fetches of various datasets to dump module from datasets module [puppet] - 10https://gerrit.wikimedia.org/r/379790 (https://phabricator.wikimedia.org/T175528) (owner: 10ArielGlenn) [15:33:52] (03PS1) 10Ema: bgp: bugfixes in MPUnreachNLRI attribute construction [debs/pybal] - 10https://gerrit.wikimedia.org/r/379973 [15:50:33] (03CR) 10Subramanya Sastry: "Looks like foundation wiki wasn't using Tidy at all. We know that Remex will do additional fixups beyond what a HTML5 parser does (in orde" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379966 (https://phabricator.wikimedia.org/T175971) (owner: 10Zoranzoki21) [16:06:00] (03PS1) 10ArielGlenn: move manifest for copying of dumps to labs, peers, to dumps module [puppet] - 10https://gerrit.wikimedia.org/r/379977 (https://phabricator.wikimedia.org/T175528) [16:14:02] (03PS2) 10ArielGlenn: move manifest for copying of dumps to labs, peers, to dumps module [puppet] - 10https://gerrit.wikimedia.org/r/379977 (https://phabricator.wikimedia.org/T175528) [16:21:35] (03PS3) 10ArielGlenn: move manifest for copying of dumps to labs, peers, to dumps module [puppet] - 10https://gerrit.wikimedia.org/r/379977 (https://phabricator.wikimedia.org/T175528) [16:22:28] (03CR) 10ArielGlenn: [C: 032] move manifest for copying of dumps to labs, peers, to dumps module [puppet] - 10https://gerrit.wikimedia.org/r/379977 (https://phabricator.wikimedia.org/T175528) (owner: 10ArielGlenn) [16:23:35] (03PS1) 10Ladsgroup: Add several rights to eliminators in fawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379980 (https://phabricator.wikimedia.org/T176553) [16:26:21] (03PS1) 10Ema: travis.yml: use precise for building [debs/pybal] - 10https://gerrit.wikimedia.org/r/379981 [16:26:56] (03CR) 10Zoranzoki21: [C: 031] Add several rights to eliminators in fawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379980 (https://phabricator.wikimedia.org/T176553) (owner: 10Ladsgroup) [16:27:38] (03CR) 10Ema: [C: 032] travis.yml: use precise for building [debs/pybal] - 10https://gerrit.wikimedia.org/r/379981 (owner: 10Ema) [16:32:29] (03PS1) 10Ema: travis.yml: use precise for building [debs/pybal] (1.14) - 10https://gerrit.wikimedia.org/r/379982 [16:32:52] 10Operations, 10ops-eqiad, 10DBA: db1100 crashed - https://phabricator.wikimedia.org/T175973#3629465 (10jcrespo) 05Open>03Resolved [16:33:54] (03CR) 10Ema: [C: 032] travis.yml: use precise for building [debs/pybal] (1.14) - 10https://gerrit.wikimedia.org/r/379982 (owner: 10Ema) [16:45:56] (03PS1) 10Ladsgroup: Add 'eliminator' as a priviliged account [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379985 (https://phabricator.wikimedia.org/T176554) [16:58:57] PROBLEM - Apache HTTP on mw1284 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 0.001 second response time [16:59:08] PROBLEM - Nginx local proxy to apache on mw1284 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 0.006 second response time [16:59:57] RECOVERY - Apache HTTP on mw1284 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.110 second response time [17:00:07] PROBLEM - Nginx local proxy to apache on mw1278 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 0.006 second response time [17:00:08] RECOVERY - Nginx local proxy to apache on mw1284 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.039 second response time [17:00:28] PROBLEM - HHVM rendering on mw1278 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 0.001 second response time [17:01:07] RECOVERY - Nginx local proxy to apache on mw1278 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.061 second response time [17:01:28] RECOVERY - HHVM rendering on mw1278 is OK: HTTP OK: HTTP/1.1 200 OK - 75035 bytes in 0.112 second response time [17:01:57] PROBLEM - Apache HTTP on mw1276 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 0.002 second response time [17:02:07] PROBLEM - HHVM rendering on mw1276 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 0.001 second response time [17:02:08] PROBLEM - Nginx local proxy to apache on mw1276 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 0.011 second response time [17:02:57] RECOVERY - Apache HTTP on mw1276 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.059 second response time [17:03:07] RECOVERY - HHVM rendering on mw1276 is OK: HTTP OK: HTTP/1.1 200 OK - 75036 bytes in 0.370 second response time [17:03:08] RECOVERY - Nginx local proxy to apache on mw1276 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.038 second response time [17:06:31] 10Operations, 10MediaWiki-extensions-MultimediaViewer, 10Multimedia, 10Traffic, 10HTTPS: MediaViewer links to creativecommons.org using http instead of https - https://phabricator.wikimedia.org/T176549#3629506 (10Tgr) 05Open>03Invalid The link is taken from the file description page (specifically [[h... [17:18:58] 10Operations, 10MediaWiki-extensions-MultimediaViewer, 10Multimedia, 10Traffic, 10HTTPS: MediaViewer links to creativecommons.org using http instead of https - https://phabricator.wikimedia.org/T176549#3629532 (10Josve05a) Thanks, hopefully [[https://commons.wikimedia.org/w/index.php?title=Template:Cc-by... [17:52:49] (03CR) 10MarcoAurelio: [C: 04-1] "'eliminator' is not a CentralAuth (global) group so you should not modify CommonSettings but add it to https://phabricator.wikimedia.org/s" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379985 (https://phabricator.wikimedia.org/T176554) (owner: 10Ladsgroup) [18:04:11] (03PS6) 10Zoranzoki21: Fix problem with throttle rule for John Michael Kohler Art Center. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379661 (https://phabricator.wikimedia.org/T176287) [18:05:18] (03CR) 10Zoranzoki21: Fix problem with throttle rule for John Michael Kohler Art Center. (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379661 (https://phabricator.wikimedia.org/T176287) (owner: 10Zoranzoki21) [19:04:25] (03PS2) 10Ladsgroup: Add 'eliminator' as a priviliged account [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379985 (https://phabricator.wikimedia.org/T176554) [19:04:27] (03CR) 10Ladsgroup: "Done, Thanks" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379985 (https://phabricator.wikimedia.org/T176554) (owner: 10Ladsgroup) [21:12:48] !log Ran scap pull on mwdebug1001 to undo changes to profile https://github.com/wmde/WikibaseDataModel/pull/761/files [21:13:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:15:48] PROBLEM - Docker registry HTTPS interface on darmstadtium is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:16:47] RECOVERY - Docker registry HTTPS interface on darmstadtium is OK: HTTP OK: HTTP/1.1 200 OK - 2461 bytes in 0.394 second response time [23:26:25] (03CR) 10MGChecker: [C: 031] "You have to rebase this." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377864 (https://phabricator.wikimedia.org/T154371) (owner: 10Framawiki)