[01:58:22] PROBLEM - Misc_Db_Lag on storage3 is CRITICAL: CHECK MySQL REPLICATION - lag - CRITICAL - Seconds_Behind_Master : 842s [02:05:33] !log LocalisationUpdate completed (1.18) at Mon Dec 26 02:05:32 UTC 2011 [02:05:43] PROBLEM - ps1-d2-sdtpa-infeed-load-tower-A-phase-Z on ps1-d2-sdtpa is CRITICAL: ps1-d2-sdtpa-infeed-load-tower-A-phase-Z CRITICAL - *2413* [02:05:50] Logged the message, Master [02:12:53] PROBLEM - MySQL replication status on storage3 is CRITICAL: CHECK MySQL REPLICATION - lag - CRITICAL - Seconds_Behind_Master : 1711s [02:32:25] RECOVERY - Misc_Db_Lag on storage3 is OK: CHECK MySQL REPLICATION - lag - OK - Seconds_Behind_Master : 0s [02:36:05] RECOVERY - MySQL replication status on storage3 is OK: CHECK MySQL REPLICATION - lag - OK - Seconds_Behind_Master : 6s [02:58:44] PROBLEM - Puppet freshness on brewster is CRITICAL: Puppet has not run in the last 10 hours [04:39:57] PROBLEM - MySQL slave status on es1004 is CRITICAL: CRITICAL: Slave running: expected Yes, got No [06:13:49] PROBLEM - Disk space on hume is CRITICAL: DISK CRITICAL - free space: /a/static/uncompressed 24279 MB (2% inode=99%): [07:01:07] PROBLEM - Puppet freshness on ms1002 is CRITICAL: Puppet has not run in the last 10 hours [08:11:38] PROBLEM - Puppet freshness on singer is CRITICAL: Puppet has not run in the last 10 hours [08:26:10] PROBLEM - Puppet freshness on es1002 is CRITICAL: Puppet has not run in the last 10 hours [09:23:19] PROBLEM - mobile traffic loggers on cp1044 is CRITICAL: PROCS CRITICAL: 6 processes with args varnishncsa [09:23:19] PROBLEM - mobile traffic loggers on cp1041 is CRITICAL: PROCS CRITICAL: 7 processes with args varnishncsa [09:23:19] PROBLEM - mobile traffic loggers on cp1043 is CRITICAL: PROCS CRITICAL: 5 processes with args varnishncsa [09:33:09] RECOVERY - mobile traffic loggers on cp1041 is OK: PROCS OK: 2 processes with args varnishncsa [09:33:09] RECOVERY - mobile traffic loggers on cp1044 is OK: PROCS OK: 1 process with args varnishncsa [09:33:29] RECOVERY - mobile traffic loggers on cp1043 is OK: PROCS OK: 1 process with args varnishncsa [09:48:49] RECOVERY - MySQL slave status on es1004 is OK: OK: [10:16:37] PROBLEM - mobile traffic loggers on cp1044 is CRITICAL: PROCS CRITICAL: 7 processes with args varnishncsa [10:16:37] PROBLEM - mobile traffic loggers on cp1041 is CRITICAL: PROCS CRITICAL: 7 processes with args varnishncsa [10:16:37] PROBLEM - mobile traffic loggers on cp1043 is CRITICAL: PROCS CRITICAL: 6 processes with args varnishncsa [10:26:17] RECOVERY - mobile traffic loggers on cp1044 is OK: PROCS OK: 3 processes with args varnishncsa [10:26:17] RECOVERY - mobile traffic loggers on cp1041 is OK: PROCS OK: 4 processes with args varnishncsa [10:26:17] RECOVERY - mobile traffic loggers on cp1043 is OK: PROCS OK: 2 processes with args varnishncsa [10:52:38] PROBLEM - mobile traffic loggers on cp1043 is CRITICAL: PROCS CRITICAL: 6 processes with args varnishncsa [11:02:38] RECOVERY - mobile traffic loggers on cp1043 is OK: PROCS OK: 1 process with args varnishncsa [12:02:37] does somebody know if svn.wikimedia.org officially supports HTTPS or not? https://meta.wikimedia.org/w/index.php?title=Talk:Interwiki_map&curid=6514&diff=3180470&oldid=3176411 [12:20:54] Nemo_bis: It does support http and https, but if you want to commit you have to use ssh [12:21:03] So no commits over https afaik [13:07:49] PROBLEM - Puppet freshness on brewster is CRITICAL: Puppet has not run in the last 10 hours [13:45:21] 'Wikimedia has experienced PHP/C developers who probably can maintain and [13:45:21] update such module. ' [13:45:23] riiiight [13:54:03] hi Domas, what module are you talking about? [13:56:13] the lua/js discussion on wikitech [14:08:35] So pageview dumps are down... I assume the relevant people are aware? [14:12:00] I don't know who would know -- domas perhaps? [14:37:32] hm [14:38:08] apergos maybe [14:38:20] oh wait [14:38:22] we lost data [14:38:28] or not [14:38:29] hm [14:38:34] they changed something [14:39:43] clowny [14:41:02] heh [14:41:08] log filter written in python? [14:49:01] jarry1250: found the problem [14:49:15] Was data lost? :/ [14:50:49] Jarry1250: yes [14:51:03] Jarry1250: someone screwed up something without thinking too much [14:51:14] :( But you can fix it for now on? [14:51:26] got a temporary fix in [14:52:22] Cool, thanks. [14:53:09] trying to figure out how people did it [14:53:15] it takes effort to screw up in this way [14:53:49] damn, must be in private repo [14:55:01] Perhaps in the long term it would be good to have something checking dump size. [14:57:56] maybe [14:58:06] it would be good if people checked when they do something [14:58:08] what is the impact [14:58:17] then you need less of long term checks [14:58:18] :) [15:00:59] annoying though to have such long term breakages [15:07:05] PROBLEM - Host amslvs3 is DOWN: PING CRITICAL - Packet loss = 100% [15:30:25] PROBLEM - BGP status on csw2-esams is CRITICAL: CRITICAL: host 91.198.174.244, sessions up: 3, down: 1, shutdown: 0BRPeering with AS64600 not established - BR [17:10:54] PROBLEM - Puppet freshness on ms1002 is CRITICAL: Puppet has not run in the last 10 hours [18:00:49] PROBLEM - Disk space on srv223 is CRITICAL: DISK CRITICAL - free space: / 198 MB (2% inode=60%): /var/lib/ureadahead/debugfs 198 MB (2% inode=60%): [18:11:55] !log reedy synchronized php-1.18/extensions/ArticleFeedback/SpecialArticleFeedback.php 'r107332 To fix fatal in prod' [18:12:04] Nemo_bis, ^ [18:12:07] Logged the message, Master [18:12:34] Reedy, wow, thanks :) [18:20:09] PROBLEM - Disk space on srv223 is CRITICAL: DISK CRITICAL - free space: / 198 MB (2% inode=60%): /var/lib/ureadahead/debugfs 198 MB (2% inode=60%): [18:21:00] PROBLEM - Puppet freshness on singer is CRITICAL: Puppet has not run in the last 10 hours [18:29:50] RECOVERY - Disk space on srv223 is OK: DISK OK [18:35:00] PROBLEM - Puppet freshness on es1002 is CRITICAL: Puppet has not run in the last 10 hours [20:02:30] !log reedy synchronized php-1.18/extensions/ConfirmEdit/FancyCaptcha.class.php 'r107340' [20:02:42] Logged the message, Master [22:12:20] New patchset: Lcarr; "adding in accept all from localhost to logging fw" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1713 [22:12:33] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/1713 [22:19:02] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/1713 [22:19:03] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1713 [22:24:38] New patchset: Lcarr; "fixing accept localhost" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1714 [22:24:51] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/1714 [22:24:58] New review: Lcarr; "subnet masks are fun!" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/1714 [22:24:58] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1714 [22:28:58] New patchset: Lcarr; "Revert "fixing accept localhost"" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1715 [22:29:13] New patchset: Lcarr; "Revert "adding in accept all from localhost to logging fw"" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1716 [22:29:27] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/1715 [22:29:27] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/1716 [22:29:33] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/1715 [22:29:34] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1715 [22:30:12] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/1716 [22:30:13] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1716 [23:04:43] gn8 folks [23:17:25] PROBLEM - Puppet freshness on brewster is CRITICAL: Puppet has not run in the last 10 hours