[00:12:26] RECOVERY - puppet last run on mw1221 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [02:24:24] !log mwdeploy@tin scap sync-l10n completed (1.28.0-wmf.18) (duration: 10m 03s) [02:24:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:30:18] !log l10nupdate@tin ResourceLoader cache refresh completed at Sun Sep 11 02:30:18 UTC 2016 (duration 5m 55s) [02:30:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [03:37:24] PROBLEM - puppet last run on ms-be1001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/swift/account.ring.gz] [03:38:07] PROBLEM - puppet last run on mw2155 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/share/GeoIP/GeoIPCity.dat.gz] [04:02:21] RECOVERY - puppet last run on ms-be1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [04:03:07] RECOVERY - puppet last run on mw2155 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [04:52:02] PROBLEM - puppet last run on lvs2004 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/bin/apt2xml] [05:16:42] RECOVERY - puppet last run on lvs2004 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:25:12] PROBLEM - puppet last run on mw1179 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/lib/nagios/plugins/check-fresh-files-in-dir.py] [06:50:05] RECOVERY - puppet last run on mw1179 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:48] PROBLEM - puppet last run on elastic2020 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 5 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[tshark],Package[tmux] [07:00:02] PROBLEM - puppet last run on elastic2014 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[tree] [07:24:40] RECOVERY - puppet last run on elastic2014 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [07:25:10] RECOVERY - puppet last run on elastic2020 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [08:44:41] !log ladsgroup@tin:~$ mwscript resetUserEmail.php --wiki=fawiki Sinasalek [08:44:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:39:37] (03PS1) 10Urbanecm: Enable WikidataPageBanner on itwikiwoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309823 (https://phabricator.wikimedia.org/T145328) [10:39:52] (03PS2) 10Urbanecm: Enable WikidataPageBanner on itwikiwoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309823 (https://phabricator.wikimedia.org/T145328) [13:53:11] PROBLEM - puppet last run on logstash1002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 7 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/apache2/sites-available/00-dummy.conf] [14:14:03] (03Draft1) 10Paladox: Enable NoteDB in gerrit [puppet] - 10https://gerrit.wikimedia.org/r/309838 [14:14:31] (03PS2) 10Paladox: Enable NoteDB in gerrit [puppet] - 10https://gerrit.wikimedia.org/r/309838 [14:14:50] (03PS3) 10Paladox: Enable NoteDB in gerrit [puppet] - 10https://gerrit.wikimedia.org/r/309838 (https://phabricator.wikimedia.org/T37534) [14:15:38] (03CR) 10Paladox: "I did testing on https://gerrit-test.wmflabs.org and this seems to work." [puppet] - 10https://gerrit.wikimedia.org/r/309838 (https://phabricator.wikimedia.org/T37534) (owner: 10Paladox) [14:18:23] RECOVERY - puppet last run on logstash1002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:26:08] (03Abandoned) 10Paladox: Enable NoteDB in gerrit [puppet] - 10https://gerrit.wikimedia.org/r/309838 (https://phabricator.wikimedia.org/T37534) (owner: 10Paladox) [14:31:47] 06Operations, 10DBA, 10MediaWiki-Maintenance-scripts, 06Release-Engineering-Team, and 2 others: Add section for long-running tasks on the Deployment page (specially for database maintenance) - https://phabricator.wikimedia.org/T144661#2626722 (10jcrespo) I would like some opinion from other ops or develope... [16:58:50] (03CR) 10Volans: [C: 04-1] "It seems that there is a missing dependency AFAICT from the role::labs::dns_floating_ip_updater Puppet class." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/309708 (owner: 10Alex Monk) [17:08:45] (03PS1) 10ArielGlenn: use max_allowed_packets for all mysql commands, not just mysqldump [dumps] - 10https://gerrit.wikimedia.org/r/309850 [17:08:47] (03PS1) 10ArielGlenn: handle empty check for text, html, plain files [dumps] - 10https://gerrit.wikimedia.org/r/309851 [17:29:50] (03PS2) 10Alex Monk: dns-floating-ip-updater: use python's ipaddress class to determine PTR FQDNs for IPs [puppet] - 10https://gerrit.wikimedia.org/r/309708 [17:31:11] PROBLEM - MariaDB Slave Lag: s5 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 1791.98 seconds [17:42:28] (03CR) 10ArielGlenn: [C: 032] use max_allowed_packets for all mysql commands, not just mysqldump [dumps] - 10https://gerrit.wikimedia.org/r/309850 (owner: 10ArielGlenn) [17:42:55] (03CR) 10ArielGlenn: [C: 032] handle empty check for text, html, plain files [dumps] - 10https://gerrit.wikimedia.org/r/309851 (owner: 10ArielGlenn) [17:43:58] ACKNOWLEDGEMENT - MariaDB Slave Lag: s5 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 2537.11 seconds Jcrespo ongoing schema change [19:08:45] (03PS1) 10Aaron Schulz: Lower wgMaxUserDBWriteDuration to 4 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309864 [21:14:29] (03CR) 10Hashar: [C: 031] "That is better :} Not sure why I got the 'labs' realm set whenever the FQDN contained 'labs' anywhere.." [puppet] - 10https://gerrit.wikimedia.org/r/309685 (owner: 10Alex Monk) [21:33:15] PROBLEM - puppet last run on mw2077 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [21:33:42] (03PS2) 10Alex Monk: hiera_lookup util: add support for labtest realm, fix check for labs [puppet] - 10https://gerrit.wikimedia.org/r/309685 [21:45:31] (03PS1) 10Odder: Add upload_by_url right to Commons bots [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309911 (https://phabricator.wikimedia.org/T145010) [21:47:11] 07Puppet, 07Beta-Cluster-reproducible: puppet failures due to "Could not find class" or "Puppet::Parser::AST::Resource failed with error ArgumentError: Invalid resource type" - https://phabricator.wikimedia.org/T131946#2183851 (10AlexMonk-WMF) ```Sep 11 20:40:24 deployment-mathoid puppet-agent[19129]: Could no... [21:47:20] (03CR) 10Odder: [C: 04-1] "Let's give the Commons community a few more days to establish consensus for this change." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309911 (https://phabricator.wikimedia.org/T145010) (owner: 10Odder) [21:58:37] RECOVERY - puppet last run on mw2077 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [21:59:58] (03PS1) 10Odder: Allow Commons 'crats to manage accountcreator group [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309912 (https://phabricator.wikimedia.org/T144689) [22:01:03] (03CR) 10Odder: [C: 04-1] "Please note that this needs to gain Commmons community's consensus before being merged." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309912 (https://phabricator.wikimedia.org/T144689) (owner: 10Odder) [22:31:37] (03PS2) 10Aaron Schulz: Lower wgMaxUserDBWriteDuration to 4 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309864 [22:31:43] (03CR) 10Aaron Schulz: [C: 032] Lower wgMaxUserDBWriteDuration to 4 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309864 (owner: 10Aaron Schulz) [22:32:11] (03Merged) 10jenkins-bot: Lower wgMaxUserDBWriteDuration to 4 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309864 (owner: 10Aaron Schulz) [22:33:49] !log aaron@tin Synchronized wmf-config/CommonSettings.php: Lower wgMaxUserDBWriteDuration to 4 (duration: 00m 47s) [22:33:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [22:46:55] PROBLEM - puppet last run on mw1196 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 7 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/var/lib/hphpd/hphpd.ini] [22:51:58] PROBLEM - HTTPS on cp3036 is CRITICAL: SSLXNN CRITICAL - 37 CRITICAL [22:51:58] PROBLEM - HTTPS on cp2003 is CRITICAL: SSLXNN CRITICAL - 37 CRITICAL [22:51:58] PROBLEM - HTTPS on cp1099 is CRITICAL: SSLXNN CRITICAL - 37 CRITICAL [22:52:07] PROBLEM - HTTPS on cp4012 is CRITICAL: SSLXNN CRITICAL - 37 CRITICAL [22:52:07] PROBLEM - HTTPS on cp2009 is CRITICAL: SSLXNN CRITICAL - 37 CRITICAL [22:52:07] PROBLEM - HTTPS on cp4020 is CRITICAL: SSLXNN CRITICAL - 37 CRITICAL [22:52:07] PROBLEM - HTTPS on cp2018 is CRITICAL: SSLXNN CRITICAL - 37 CRITICAL [22:52:07] PROBLEM - HTTPS on cp1053 is CRITICAL: SSLXNN CRITICAL - 37 CRITICAL [22:52:07] PROBLEM - HTTPS on cp4009 is CRITICAL: SSLXNN CRITICAL - 37 CRITICAL [22:52:08] PROBLEM - HTTPS on cp3008 is CRITICAL: SSLXNN CRITICAL - 37 CRITICAL [22:52:08] PROBLEM - HTTPS on cp1050 is CRITICAL: SSLXNN CRITICAL - 37 CRITICAL [22:52:09] PROBLEM - HTTPS on cp1045 is CRITICAL: SSLXNN CRITICAL - 37 CRITICAL [22:52:09] PROBLEM - HTTPS on cp3031 is CRITICAL: SSLXNN CRITICAL - 37 CRITICAL [22:52:10] PROBLEM - HTTPS on cp1061 is CRITICAL: SSLXNN CRITICAL - 37 CRITICAL [22:52:10] PROBLEM - HTTPS on cp3003 is CRITICAL: SSLXNN CRITICAL - 37 CRITICAL [22:52:37] PROBLEM - HTTPS on cp4018 is CRITICAL: SSLXNN CRITICAL - 37 CRITICAL [22:52:37] PROBLEM - HTTPS on cp4016 is CRITICAL: SSLXNN CRITICAL - 37 CRITICAL [22:52:37] PROBLEM - HTTPS on cp2004 is CRITICAL: SSLXNN CRITICAL - 37 CRITICAL [22:52:38] PROBLEM - HTTPS on cp4011 is CRITICAL: SSLXNN CRITICAL - 37 CRITICAL [22:52:48] PROBLEM - HTTPS on cp3037 is CRITICAL: SSLXNN CRITICAL - 37 CRITICAL [22:52:48] PROBLEM - HTTPS on cp4002 is CRITICAL: SSLXNN CRITICAL - 37 CRITICAL [22:52:48] PROBLEM - HTTPS on cp1071 is CRITICAL: SSLXNN CRITICAL - 37 CRITICAL [22:52:49] PROBLEM - HTTPS on cp1060 is CRITICAL: SSLXNN CRITICAL - 37 CRITICAL [22:52:49] PROBLEM - HTTPS on cp1054 is CRITICAL: SSLXNN CRITICAL - 37 CRITICAL [22:52:49] PROBLEM - HTTPS on cp1065 is CRITICAL: SSLXNN CRITICAL - 37 CRITICAL [22:52:49] PROBLEM - HTTPS on cp1052 is CRITICAL: SSLXNN CRITICAL - 37 CRITICAL [22:52:49] PROBLEM - HTTPS on cp3040 is CRITICAL: SSLXNN CRITICAL - 37 CRITICAL [22:52:57] PROBLEM - HTTPS on cp3043 is CRITICAL: SSLXNN CRITICAL - 37 CRITICAL [22:52:57] PROBLEM - HTTPS on cp4006 is CRITICAL: SSLXNN CRITICAL - 37 CRITICAL [22:52:57] PROBLEM - HTTPS on cp1074 is CRITICAL: SSLXNN CRITICAL - 37 CRITICAL [22:52:57] PROBLEM - HTTPS on cp4004 is CRITICAL: SSLXNN CRITICAL - 37 CRITICAL [22:52:57] PROBLEM - HTTPS on cp3042 is CRITICAL: SSLXNN CRITICAL - 37 CRITICAL [22:52:57] PROBLEM - HTTPS on cp4010 is CRITICAL: SSLXNN CRITICAL - 37 CRITICAL [22:52:58] PROBLEM - HTTPS on cp2025 is CRITICAL: SSLXNN CRITICAL - 37 CRITICAL [22:52:58] PROBLEM - HTTPS on cp2015 is CRITICAL: SSLXNN CRITICAL - 37 CRITICAL [22:52:58] PROBLEM - HTTPS on cp4001 is CRITICAL: SSLXNN CRITICAL - 37 CRITICAL [22:53:00] PROBLEM - HTTPS on cp2023 is CRITICAL: SSLXNN CRITICAL - 37 CRITICAL [22:53:00] PROBLEM - HTTPS on cp1073 is CRITICAL: SSLXNN CRITICAL - 37 CRITICAL [22:53:01] PROBLEM - HTTPS on cp4005 is CRITICAL: SSLXNN CRITICAL - 37 CRITICAL [22:53:18] PROBLEM - HTTPS on cp1059 is CRITICAL: SSLXNN CRITICAL - 37 CRITICAL [22:53:18] PROBLEM - HTTPS on cp3047 is CRITICAL: SSLXNN CRITICAL - 37 CRITICAL [22:53:31] PROBLEM - HTTPS on cp3032 is CRITICAL: SSLXNN CRITICAL - 37 CRITICAL [22:53:31] PROBLEM - HTTPS on cp2001 is CRITICAL: SSLXNN CRITICAL - 37 CRITICAL [22:53:31] PROBLEM - HTTPS on cp2022 is CRITICAL: SSLXNN CRITICAL - 37 CRITICAL [22:53:31] PROBLEM - HTTPS on cp3048 is CRITICAL: SSLXNN CRITICAL - 37 CRITICAL [22:53:31] PROBLEM - HTTPS on cp3030 is CRITICAL: SSLXNN CRITICAL - 37 CRITICAL [22:55:22] it's this: CRITICAL - Certificate *.wikipedia.org valid until 2016-12-10 22:46:04 +0000 (expires in 89 days) [22:56:08] basically it just went over the 90 day mark [23:01:58] 90 days is "critical" ? :/ [23:04:13] modules/nagios_common/files/check_commands/check_sslxNN.cfg: command_line $USER1$/check_sslxNN --critical 90 -H $HOSTADDRESS$ -p 443 [23:04:14] yes [23:04:42] than what's a warning for it? [23:05:14] I think it's a bit paranoid :) [23:06:50] I'm assuming that goes to the default [23:06:57] Which for check_ssl is 30 [23:09:25] but we'll never hit that with the current config, because critical is already set higher [23:12:13] RECOVERY - puppet last run on mw1196 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [23:17:25] PROBLEM - Disk space on scb1002 is CRITICAL: DISK CRITICAL - free space: / 350 MB (3% inode=87%) [23:47:45] (03PS1) 10Faidon Liambotis: Reduce check_sslxNN alert thresholds to 30d/15d [puppet] - 10https://gerrit.wikimedia.org/r/309923