[00:16:33] wtf, logging into phabricator: [00:16:33] Unhandled Exception ("Exception") [00:16:35] LDAP record query returned more than one result. The query must uniquely identify a record. [00:17:08] * Platonides realises he put "phabricator" as username [00:17:31] Platonides: Still crap error handling :P [00:18:33] That error has shown up before. [00:56:29] PROBLEM - puppet last run on terbium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [01:06:09] PROBLEM - puppet last run on es1019 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [01:25:29] RECOVERY - puppet last run on terbium is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [01:32:19] PROBLEM - puppet last run on wtp1017 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [01:34:09] RECOVERY - puppet last run on es1019 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [02:01:19] RECOVERY - puppet last run on wtp1017 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [02:03:19] PROBLEM - puppet last run on contint1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [02:19:19] RECOVERY - puppet last run on contint1001 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [02:19:46] !log l10nupdate@tin scap sync-l10n completed (1.29.0-wmf.13) (duration: 07m 26s) [02:19:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:25:08] !log l10nupdate@tin ResourceLoader cache refresh completed at Sun Feb 26 02:25:07 UTC 2017 (duration 5m 21s) [02:25:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:51:59] PROBLEM - Misc HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] [02:52:19] PROBLEM - Misc HTTP 5xx reqs/min on graphite2001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] [02:58:59] PROBLEM - Misc HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] [03:01:19] RECOVERY - Misc HTTP 5xx reqs/min on graphite2001 is OK: OK: Less than 1.00% above the threshold [250.0] [03:01:59] RECOVERY - Misc HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [03:02:59] PROBLEM - puppet last run on mc1013 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [03:18:39] RECOVERY - etcdmirror-conftool-eqiad-wmnet service on conf2002 is OK: OK - etcdmirror-conftool-eqiad-wmnet is active [03:21:39] PROBLEM - etcdmirror-conftool-eqiad-wmnet service on conf2002 is CRITICAL: CRITICAL - Expecting active but unit etcdmirror-conftool-eqiad-wmnet is failed [03:21:49] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 600.25 seconds [03:26:49] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 236.23 seconds [03:31:59] RECOVERY - puppet last run on mc1013 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [04:06:54] (03Abandoned) 10Juniorsys: Linting changes (multiple) [puppet] - 10https://gerrit.wikimedia.org/r/334299 (https://phabricator.wikimedia.org/T93645) (owner: 10Juniorsys) [04:55:02] (03PS7) 10Juniorsys: openstack: Linting changes [puppet] - 10https://gerrit.wikimedia.org/r/334301 (https://phabricator.wikimedia.org/T93645) [05:41:09] PROBLEM - puppet last run on cp1050 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:09:09] RECOVERY - puppet last run on cp1050 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [06:18:39] RECOVERY - Check systemd state on conf2002 is OK: OK - running: The system is fully operational [06:21:39] PROBLEM - Check systemd state on conf2002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [06:30:07] (03PS1) 10Tim Landscheidt: Tools: Require gridengine-master for gridengine_resource [puppet] - 10https://gerrit.wikimedia.org/r/339921 (https://phabricator.wikimedia.org/T127388) [06:58:19] PROBLEM - puppet last run on mw1251 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:07:39] PROBLEM - IPv6 ping to codfw on ripe-atlas-codfw is CRITICAL: CRITICAL - failed 20 probes of 266 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map [07:26:19] RECOVERY - puppet last run on mw1251 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [07:37:39] PROBLEM - puppet last run on labvirt1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:53:29] PROBLEM - puppet last run on elastic1051 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [08:02:59] PROBLEM - puppet last run on rdb1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [08:05:39] RECOVERY - puppet last run on labvirt1003 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [08:21:29] RECOVERY - puppet last run on elastic1051 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [08:24:29] PROBLEM - puppet last run on cp3045 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [08:31:59] RECOVERY - puppet last run on rdb1004 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [08:33:29] PROBLEM - All k8s worker nodes are healthy on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/k8s/nodes/ready - 185 bytes in 0.135 second response time [08:53:29] RECOVERY - puppet last run on cp3045 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [09:00:29] RECOVERY - All k8s worker nodes are healthy on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.181 second response time [10:00:29] PROBLEM - configured eth on bast3001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [10:00:39] PROBLEM - Check whether ferm is active by checking the default input chain on bast3001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [10:00:49] PROBLEM - dhclient process on bast3001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [10:00:49] PROBLEM - Check systemd state on bast3001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [10:00:49] PROBLEM - Check size of conntrack table on bast3001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [10:00:50] PROBLEM - puppet last run on bast3001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [10:01:19] RECOVERY - configured eth on bast3001 is OK: OK - interfaces up [10:01:29] RECOVERY - Check whether ferm is active by checking the default input chain on bast3001 is OK: OK ferm input default policy is set [10:01:40] RECOVERY - dhclient process on bast3001 is OK: PROCS OK: 0 processes with command name dhclient [10:01:40] RECOVERY - Check size of conntrack table on bast3001 is OK: OK: nf_conntrack is 0 % full [10:01:40] RECOVERY - Check systemd state on bast3001 is OK: OK - running: The system is fully operational [10:02:49] RECOVERY - puppet last run on bast3001 is OK: OK: Puppet is currently enabled, last run 16 minutes ago with 0 failures [10:12:39] RECOVERY - IPv6 ping to codfw on ripe-atlas-codfw is OK: OK - failed 17 probes of 266 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map [12:02:00] PROBLEM - puppet last run on db1094 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [12:02:59] PROBLEM - Check systemd state on labstore1005 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [12:02:59] PROBLEM - Ensure mysql credential creation for tools users is running on labstore1005 is CRITICAL: CRITICAL - Expecting active but unit maintain-dbusers is failed [12:09:59] RECOVERY - Check systemd state on labstore1005 is OK: OK - running: The system is fully operational [12:09:59] RECOVERY - Ensure mysql credential creation for tools users is running on labstore1005 is OK: OK - maintain-dbusers is active [12:29:59] RECOVERY - puppet last run on db1094 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [13:22:19] PROBLEM - puppet last run on mw1198 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [13:47:29] PROBLEM - puppet last run on mw1298 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [13:51:19] RECOVERY - puppet last run on mw1198 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [14:04:29] PROBLEM - salt-minion processes on thumbor1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [14:04:49] PROBLEM - dhclient process on thumbor1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [14:06:53] 06Operations, 10Traffic: cp2017 froze and stopped serving traffic - https://phabricator.wikimedia.org/T159056#3055512 (10ema) Nothing particularly interesting in `kern.log` except perhaps for the hrtimer message? Host powercycled at 19:55: ``` Feb 24 06:25:06 cp2017 kernel: [1346616.572311] Process accounting... [14:06:58] 06Operations, 10Traffic: cp2017 froze and stopped serving traffic - https://phabricator.wikimedia.org/T159056#3055991 (10ema) p:05Triage>03Normal [14:08:19] RECOVERY - salt-minion processes on thumbor1002 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [14:08:39] RECOVERY - dhclient process on thumbor1002 is OK: PROCS OK: 0 processes with command name dhclient [14:16:29] RECOVERY - puppet last run on mw1298 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [14:31:19] PROBLEM - puppet last run on ms-be1008 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [14:36:29] PROBLEM - puppet last run on ms-be1023 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:00:19] RECOVERY - puppet last run on ms-be1008 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [15:04:29] RECOVERY - puppet last run on ms-be1023 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [15:48:58] !ops [15:49:27] NotASpy: lol goatsex [15:51:02] nota [15:51:05] NotASpy: [15:54:32] Reedy: they're in -tech, if you can dispatch them from there also. [15:54:43] NotASpy: Unfortunately, I can't [15:56:24] No idea what the old topic is, so blank will do for now [15:56:55] was just about to copy/paste the old topic [15:57:28] and they've finally been k-lined [15:57:43] And back immediately [15:57:55] in the same /24 [15:58:08] The block will at least keep them out of here :P [16:02:39] PROBLEM - puppet last run on poolcounter1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:03:30] They're appearing in other channels now [16:07:51] I guess its the same people from last week [16:12:13] !ops [16:12:18] grumble sucks dix dix dix [16:12:26] no kline allowed [16:12:28] fucker [16:12:30] im edgy [16:12:43] Reedy ^^ [16:13:25] * Oresrian thinks +t or +r would probably be a good idea, at least temporarially [16:22:29] PROBLEM - puppet last run on labsdb1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:30:39] RECOVERY - puppet last run on poolcounter1002 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [16:33:29] PROBLEM - puppet last run on mc1034 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:50:29] RECOVERY - puppet last run on labsdb1001 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [16:55:41] Reedy: got a minute to dry-run a maintenance script for me, please? [16:55:49] I could [16:55:52] :P [16:56:00] namespaceDupes.php on ext.wikipedia [16:56:15] and put the output on a phab paste? [16:56:22] thanks ^_^ [16:57:39] PROBLEM - Redis replication status tcp_6479 on rdb2005 is CRITICAL: CRITICAL ERROR - Redis Library - can not ping 10.192.32.133 on port 6479 [16:58:39] RECOVERY - Redis replication status tcp_6479 on rdb2005 is OK: OK: REDIS 2.8.17 on 10.192.32.133:6479 has 1 databases (db0) with 4032041 keys, up 118 days 8 hours - replication_delay is 0 [16:59:03] TabbyCat: https://phabricator.wikimedia.org/P4983 [17:00:46] Reedy: thanks, looks like that wiki needs serious maintenance [17:00:54] TabbyCat: We can just run it [17:00:56] and fix 99% of it [17:01:00] we've renamed the NS_CATEGORY namespace [17:01:11] Categoria -> Categor *í* a [17:01:18] note the í [17:01:29] RECOVERY - puppet last run on mc1034 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [17:01:45] I wonder if the script could move all bad-named Categoria: pages to Categoría ones Reedy ? [17:02:41] if it can be run then T158914 [17:02:42] T158914: namespaceDupes.php for ext.wikipedia - https://phabricator.wikimedia.org/T158914 [17:05:00] but should be on terbium iirc, well, you know better [17:05:19] PROBLEM - puppet last run on mw1223 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:05:25] Doesn't matter :) [17:05:26] $this->addOption( 'source-pseudo-namespace', "Move all pages with the given source " . [17:05:26] "prefix (with an implied colon following it). If --dest-namespace is not specified, " . [17:05:26] "the colon will be replaced with a hyphen.", [17:05:29] false, true ); [17:05:31] $this->addOption( 'dest-namespace', "In combination with --source-pseudo-namespace, " . [17:05:33] "specify the namespace ID of the destination.", false, true ); [17:05:49] It's not a long running script [17:06:02] be my guest then [17:07:07] Do I need to pass the parameters? [17:07:16] Or can I just run it as is? [17:07:49] PROBLEM - dhclient process on thumbor1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [17:08:17] Reedy: see T158914 [17:08:17] T158914: namespaceDupes.php for ext.wikipedia - https://phabricator.wikimedia.org/T158914 [17:08:39] RECOVERY - dhclient process on thumbor1002 is OK: PROCS OK: 0 processes with command name dhclient [17:08:55] and T157846 [17:08:55] T157846: NS_CATEGORY (TALK) misspelt for 'ext' - https://phabricator.wikimedia.org/T157846 [17:10:23] TabbyCat: https://phabricator.wikimedia.org/P4984 [17:10:31] !log ran namespaceDupes for extwiki [17:10:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:11:36] hmm [17:14:07] MariaDB [extwiki_p]> select count(*) from page where page_namespace = "Categoria"; => 4123 [17:20:11] Reedy: seems like resolved and "touching" the pages refreshes their titles and appears correctly named afterwards [17:20:24] * TabbyCat runs touch.py [17:20:25] heh [17:24:39] PROBLEM - puppet last run on aqs1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:33:20] RECOVERY - puppet last run on mw1223 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [17:51:39] RECOVERY - puppet last run on aqs1004 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [17:57:29] PROBLEM - puppet last run on db1046 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:03:47] TabbyCat: page_namespace is an integer... [18:04:13] Is that a pseudo-query? [18:17:29] Yvette: we modified the MessagesExt.php file in mediawiki-core. Categories were badly named for ext. [18:17:56] but it seems that while the wiki refreshes the links are being corrected [18:24:15] https://p.defau.lt/?7gdcNEhUwydqHQmVvUJP4A [18:24:17] Huh, bizarre. [18:25:29] RECOVERY - puppet last run on db1046 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [18:26:19] PROBLEM - puppet last run on db1084 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:29:19] PROBLEM - puppet last run on conf1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:47:39] PROBLEM - puppet last run on elastic1041 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:54:19] RECOVERY - puppet last run on db1084 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [18:57:19] RECOVERY - puppet last run on conf1001 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [19:15:39] RECOVERY - puppet last run on elastic1041 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [19:28:39] PROBLEM - puppet last run on cp3007 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [19:36:37] 06Operations, 10MediaWiki-extensions-InterwikiSorting, 10Wikidata, 10Wikimedia-Extension-setup, and 3 others: Deploy InterwikiSorting extension to production - https://phabricator.wikimedia.org/T150183#3056323 (10Addshore) @Lydia_Pintscher @Lea_Lacroix_WMDE I guess this does still need announcement then? [19:56:39] RECOVERY - puppet last run on cp3007 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [19:58:29] PROBLEM - puppet last run on ms-be1010 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [20:27:29] RECOVERY - puppet last run on ms-be1010 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [21:00:19] PROBLEM - puppet last run on mw1258 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [21:26:31] (03Draft1) 10Paladox: Gerrit: Make GerritBot report author of patch and the uploader of patch [puppet] - 10https://gerrit.wikimedia.org/r/339980 [21:26:47] (03PS2) 10Paladox: Gerrit: Make GerritBot report author of patch and the uploader of patch [puppet] - 10https://gerrit.wikimedia.org/r/339980 (https://phabricator.wikimedia.org/T76291) [21:28:19] PROBLEM - puppet last run on db1067 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [21:28:30] (03PS3) 10Paladox: Gerrit: Make GerritBot report author of patch and the uploader of patch [puppet] - 10https://gerrit.wikimedia.org/r/339980 (https://phabricator.wikimedia.org/T76291) [21:29:19] RECOVERY - puppet last run on mw1258 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [21:30:11] (03PS4) 10Paladox: Gerrit: Make GerritBot report author of patch and the uploader of patch [puppet] - 10https://gerrit.wikimedia.org/r/339980 (https://phabricator.wikimedia.org/T76291) [21:46:09] PROBLEM - puppet last run on cp3040 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [21:56:19] RECOVERY - puppet last run on db1067 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [22:15:09] RECOVERY - puppet last run on cp3040 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [22:43:19] PROBLEM - puppet last run on wdqs1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:00:59] PROBLEM - cassandra-c SSL 10.192.16.167:7001 on restbase2002 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused [23:01:49] PROBLEM - cassandra-c CQL 10.192.16.167:9042 on restbase2002 is CRITICAL: connect to address 10.192.16.167 and port 9042: Connection refused [23:02:49] PROBLEM - cassandra-a SSL 10.192.48.54:7001 on restbase2009 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused [23:03:29] PROBLEM - cassandra-a CQL 10.192.48.54:9042 on restbase2009 is CRITICAL: connect to address 10.192.48.54 and port 9042: Connection refused [23:03:29] PROBLEM - Check systemd state on restbase2002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [23:03:39] PROBLEM - cassandra-c service on restbase2002 is CRITICAL: CRITICAL - Expecting active but unit cassandra-c is failed [23:04:29] RECOVERY - cassandra-a CQL 10.192.48.54:9042 on restbase2009 is OK: TCP OK - 0.036 second response time on 10.192.48.54 port 9042 [23:04:49] RECOVERY - cassandra-a SSL 10.192.48.54:7001 on restbase2009 is OK: SSL OK - Certificate restbase2009-a valid until 2017-09-12 15:36:07 +0000 (expires in 197 days) [23:12:20] RECOVERY - puppet last run on wdqs1003 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [23:15:29] RECOVERY - Check systemd state on restbase2002 is OK: OK - running: The system is fully operational [23:15:39] RECOVERY - cassandra-c service on restbase2002 is OK: OK - cassandra-c is active [23:16:49] RECOVERY - cassandra-c CQL 10.192.16.167:9042 on restbase2002 is OK: TCP OK - 0.038 second response time on 10.192.16.167 port 9042 [23:16:59] RECOVERY - cassandra-c SSL 10.192.16.167:7001 on restbase2002 is OK: SSL OK - Certificate restbase2002-c valid until 2017-09-12 15:35:06 +0000 (expires in 197 days) [23:20:39] PROBLEM - Redis replication status tcp_6479 on rdb2005 is CRITICAL: CRITICAL: replication_delay is 660 600 - REDIS 2.8.17 on 10.192.32.133:6479 has 1 databases (db0) with 4056194 keys, up 118 days 14 hours - replication_delay is 660 [23:20:39] PROBLEM - Redis replication status tcp_6479 on rdb2006 is CRITICAL: CRITICAL: replication_delay is 659 600 - REDIS 2.8.17 on 10.192.48.44:6479 has 1 databases (db0) with 4056066 keys, up 118 days 14 hours - replication_delay is 659 [23:31:39] RECOVERY - Redis replication status tcp_6479 on rdb2005 is OK: OK: REDIS 2.8.17 on 10.192.32.133:6479 has 1 databases (db0) with 4034099 keys, up 118 days 15 hours - replication_delay is 0 [23:33:39] RECOVERY - Redis replication status tcp_6479 on rdb2006 is OK: OK: REDIS 2.8.17 on 10.192.48.44:6479 has 1 databases (db0) with 4033814 keys, up 118 days 15 hours - replication_delay is 0 [23:58:55] (03CR) 10Paladox: "Tested and works here https://phab-01.wmflabs.org/T20#2762" [puppet] - 10https://gerrit.wikimedia.org/r/339980 (https://phabricator.wikimedia.org/T76291) (owner: 10Paladox)