[00:53:37] 06Operations, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests, 05Goal: Wikipedias with zh-* language codes waiting to be renamed (zh-min-nan -> nan, zh-yue -> yue, zh-classical -> lzh) - https://phabricator.wikimedia.org/T10217#3184119 (10Liuxinyu970226) p:05Normal>03Lowest to reflect the actual... [04:08:43] PROBLEM - mailman I/O stats on fermium is CRITICAL: CRITICAL - I/O stats: Transfers/Sec=2864.90 Read Requests/Sec=2397.80 Write Requests/Sec=612.70 KBytes Read/Sec=32312.80 KBytes_Written/Sec=8824.00 [04:17:43] RECOVERY - mailman I/O stats on fermium is OK: OK - I/O stats: Transfers/Sec=0.80 Read Requests/Sec=0.20 Write Requests/Sec=0.30 KBytes Read/Sec=1.20 KBytes_Written/Sec=11.20 [04:53:13] PROBLEM - dhclient process on thumbor1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:53:23] PROBLEM - salt-minion processes on thumbor1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:54:03] RECOVERY - dhclient process on thumbor1002 is OK: PROCS OK: 0 processes with command name dhclient [04:54:13] RECOVERY - salt-minion processes on thumbor1002 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [05:16:23] PROBLEM - puppet last run on lvs3003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [05:44:33] RECOVERY - puppet last run on lvs3003 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [07:18:18] (03PS3) 10ArielGlenn: scripts to generate a series of checkpoint files for a dump run manually [dumps] - 10https://gerrit.wikimedia.org/r/342846 (https://phabricator.wikimedia.org/T160507) [07:18:21] (03PS1) 10ArielGlenn: extra verbosity for page ranges we will probably toss later [dumps] - 10https://gerrit.wikimedia.org/r/348268 [07:18:40] (03CR) 10jerkins-bot: [V: 04-1] extra verbosity for page ranges we will probably toss later [dumps] - 10https://gerrit.wikimedia.org/r/348268 (owner: 10ArielGlenn) [07:18:42] (03CR) 10jerkins-bot: [V: 04-1] scripts to generate a series of checkpoint files for a dump run manually [dumps] - 10https://gerrit.wikimedia.org/r/342846 (https://phabricator.wikimedia.org/T160507) (owner: 10ArielGlenn) [07:23:53] PROBLEM - Disk space on ocg1003 is CRITICAL: DISK CRITICAL - free space: /srv/deployment/ocg/output 7633 MB (3% inode=97%) [07:26:13] (03PS2) 10ArielGlenn: extra verbosity for page ranges we will probably toss later [dumps] - 10https://gerrit.wikimedia.org/r/348268 [07:26:15] (03PS4) 10ArielGlenn: scripts to generate a series of checkpoint files for a dump run manually [dumps] - 10https://gerrit.wikimedia.org/r/342846 (https://phabricator.wikimedia.org/T160507) [07:26:42] (03CR) 10jerkins-bot: [V: 04-1] scripts to generate a series of checkpoint files for a dump run manually [dumps] - 10https://gerrit.wikimedia.org/r/342846 (https://phabricator.wikimedia.org/T160507) (owner: 10ArielGlenn) [07:27:40] (03PS5) 10ArielGlenn: scripts to generate a series of checkpoint files for a dump run manually [dumps] - 10https://gerrit.wikimedia.org/r/342846 (https://phabricator.wikimedia.org/T160507) [07:50:03] PROBLEM - nova-compute process on labvirt1009 is CRITICAL: PROCS CRITICAL: 2 processes with regex args ^/usr/bin/python /usr/bin/nova-compute [07:51:03] RECOVERY - nova-compute process on labvirt1009 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/nova-compute [08:34:23] PROBLEM - Check health of redis instance on 6479 on rdb2005 is CRITICAL: CRITICAL ERROR - Redis Library - can not ping 127.0.0.1 on port 6479 [08:35:24] RECOVERY - Check health of redis instance on 6479 on rdb2005 is OK: OK: REDIS 2.8.17 on 127.0.0.1:6479 has 1 databases (db0) with 2967835 keys, up 22 days 16 hours - replication_delay is 0 [09:10:33] PROBLEM - puppet last run on mw2119 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/lib/nagios/plugins/check_sysctl] [09:37:33] RECOVERY - puppet last run on mw2119 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [11:58:03] PROBLEM - puppet last run on cp4016 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [12:12:27] that Redis instance keeps flapping during replication [12:14:23] will check later on :) [12:27:03] RECOVERY - puppet last run on cp4016 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [12:34:38] (03PS1) 10Reedy: wfLoadExtension( 'ZeroBanner' ) in mobile.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/348295 (https://phabricator.wikimedia.org/T163041) [12:38:00] (03PS1) 10Reedy: PageTriage to extension.json in extension-list [mediawiki-config] - 10https://gerrit.wikimedia.org/r/348296 [12:39:15] (03CR) 10Umherirrender: wfLoadExtension( 'ZeroBanner' ) in mobile.php (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/348295 (https://phabricator.wikimedia.org/T163041) (owner: 10Reedy) [12:45:42] (03CR) 10Umherirrender: [C: 031] "You can link T139800" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/348296 (owner: 10Reedy) [13:56:13] PROBLEM - puppet last run on wtp1010 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [14:23:33] 06Operations, 10DBA, 10Traffic: dbtree broken (for some users?) - https://phabricator.wikimedia.org/T162976#3184518 (10bd808) It's working for me today with this response header: `X-Cache: cp1058 miss, cp1045 miss`. This indicates a different route than I was getting on Thursday/Friday. I can still recreate... [14:25:13] RECOVERY - puppet last run on wtp1010 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [15:02:24] PROBLEM - dhclient process on thumbor1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [15:02:24] PROBLEM - salt-minion processes on thumbor1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [15:04:13] RECOVERY - salt-minion processes on thumbor1002 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [15:04:15] RECOVERY - dhclient process on thumbor1002 is OK: PROCS OK: 0 processes with command name dhclient [15:06:40] (03PS6) 10ArielGlenn: scripts to generate a series of checkpoint files for a dump run manually [dumps] - 10https://gerrit.wikimedia.org/r/342846 (https://phabricator.wikimedia.org/T160507) [15:06:42] (03PS1) 10ArielGlenn: permit the page range job shell script to run without locks if desired [dumps] - 10https://gerrit.wikimedia.org/r/348302 [15:50:51] (03Draft1) 10Addshore: Add InterwikiSortOrders to noc docroot [mediawiki-config] - 10https://gerrit.wikimedia.org/r/348305 [15:58:53] (03PS1) 10Addshore: Configure InterwikiSorting orders for Wiktionaries [mediawiki-config] - 10https://gerrit.wikimedia.org/r/348306 (https://phabricator.wikimedia.org/T162926) [16:05:53] (03PS1) 10Addshore: Use group0 to reduce lines for WMDE related config settings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/348307 [16:53:15] (03PS1) 10Urbanecm: Enable NewUserMessage on oh_classicalwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/348309 (https://phabricator.wikimedia.org/T163043) [16:56:24] (03PS2) 10Urbanecm: Enable NewUserMessage on zh_classicalwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/348309 (https://phabricator.wikimedia.org/T163043) [17:44:21] sure is quiet in here today, a nice change [17:45:44] Saturday and sunday are often quieter [17:56:13] PROBLEM - puppet last run on elastic1020 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:11:43] PROBLEM - Check health of redis instance on 6479 on rdb2005 is CRITICAL: CRITICAL ERROR - Redis Library - can not ping 127.0.0.1 on port 6479 [18:12:43] RECOVERY - Check health of redis instance on 6479 on rdb2005 is OK: OK: REDIS 2.8.17 on 127.0.0.1:6479 has 1 databases (db0) with 2963876 keys, up 23 days 1 hours - replication_delay is 0 [18:24:13] RECOVERY - puppet last run on elastic1020 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [22:07:43] PROBLEM - puppet last run on lvs4004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [22:35:43] RECOVERY - puppet last run on lvs4004 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures