[00:06:03] New patchset: Pyoungmeister; "assigning more stuff to fake hosts to make the catch-all term the same as in pmtpa" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3178 [00:06:15] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3178 [00:08:09] New review: Pyoungmeister; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3178 [00:08:12] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3178 [00:17:08] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:31:41] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 9.922 seconds [00:35:35] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:38:19] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 6.128 seconds [00:40:07] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:44:01] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 4.861 seconds [00:52:25] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:56:28] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 9.098 seconds [00:59:28] PROBLEM - Lucene on search1015 is CRITICAL: Connection refused [01:02:46] RECOVERY - RAID on srv197 is OK: OK: no RAID installed [01:05:19] RECOVERY - RAID on srv243 is OK: OK: no RAID installed [01:06:52] New patchset: Ryan Lane; "Fixing apache config for gerrit to work with labs" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3179 [01:07:04] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3179 [01:07:42] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3179 [01:07:44] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3179 [01:10:33] New patchset: Ryan Lane; "Revert "Fixing apache config for gerrit to work with labs"" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3180 [01:10:45] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3180 [01:10:50] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3180 [01:10:52] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3180 [01:16:55] Secure server down? [01:17:28] PROBLEM - HTTP on singer is CRITICAL: Connection refused [01:23:37] RECOVERY - HTTP on singer is OK: HTTP OK - HTTP/1.1 302 Found - 0.004 second response time [01:25:16] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:25:35] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:27:13] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 9.382 seconds [01:27:58] PROBLEM - Swift HTTP on copper is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:29:56] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1843 bytes in 8.417 seconds [01:36:04] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:39:40] PROBLEM - MySQL Slave Running on db1042 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [01:42:13] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 7.982 seconds [01:45:04] PROBLEM - Puppet freshness on mw1020 is CRITICAL: Puppet has not run in the last 10 hours [01:49:52] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:51:49] PROBLEM - Swift HTTP on magnesium is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:00:04] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 6.529 seconds [02:06:58] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:08:20] gn8 folks [02:10:25] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:17:36] !log LocalisationUpdate completed (1.19) at Thu Mar 15 02:17:35 UTC 2012 [02:17:40] Logged the message, Master [02:22:40] PROBLEM - Swift HTTP on zinc is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:22:58] PROBLEM - Swift HTTP on copper is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:43:04] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 9.093 seconds [02:49:13] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:52:49] PROBLEM - Swift HTTP on magnesium is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:53:34] PROBLEM - Swift HTTP on zinc is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:01:40] Would anyone like a question about API usage in a mobile app? [03:01:53] https://meta.wikimedia.org/w/index.php?diff=3569474&oldid=3567703&rcid=3184637#Query_Limit_from_mobile_or_web_Client [03:02:41] I think I can make up an answer and hope it's right. [03:03:55] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 9.103 seconds [03:09:20] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:10:27] https://meta.wikimedia.org/w/index.php?title=Tech&diff=3569658&oldid=3569474 [03:10:39] Hmmm. [03:11:08] snitch is a bot. He's going to stalk Meta-Wiki's [[Tech]] page for us. [03:11:14] Which can't be as annoying as gerrit or nagios. [03:11:22] !stalk meta.wikimedia page Tech [03:11:23] Rule added. [03:15:02] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 6.747 seconds [03:15:29] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 7.030 seconds [03:19:51] [[Tech]]; Philippe (WMF); Yep. ; https://meta.wikimedia.org/w/index.php?diff=3569666&oldid=3569658&rcid=3184764 [03:21:29] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:21:56] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:25:59] PROBLEM - Puppet freshness on owa3 is CRITICAL: Puppet has not run in the last 10 hours [03:25:59] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 6.751 seconds [03:27:56] PROBLEM - Puppet freshness on amslvs2 is CRITICAL: Puppet has not run in the last 10 hours [03:28:05] PROBLEM - Swift HTTP on zinc is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:28:05] PROBLEM - Swift HTTP on copper is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:31:50] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 5.460 seconds [03:34:59] PROBLEM - Puppet freshness on owa1 is CRITICAL: Puppet has not run in the last 10 hours [03:34:59] PROBLEM - Puppet freshness on owa2 is CRITICAL: Puppet has not run in the last 10 hours [03:45:02] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:49:05] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 7.821 seconds [03:52:59] PROBLEM - Puppet freshness on professor is CRITICAL: Puppet has not run in the last 10 hours [04:01:31] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:03:46] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 5.822 seconds [04:10:13] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:10:13] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:12:10] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 0.021 seconds [04:12:10] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 0.008 seconds [04:26:52] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:26:52] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:33:01] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 8.448 seconds [04:33:01] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 8.455 seconds [04:43:04] PROBLEM - Swift HTTP on zinc is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:43:13] PROBLEM - Swift HTTP on copper is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:43:13] PROBLEM - Swift HTTP on magnesium is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:55:58] New patchset: Dzahn; "allow virt[1-5] subnet to access spence" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3186 [04:56:10] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3186 [04:57:00] New review: Dzahn; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/3186 [04:57:03] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3186 [04:59:37] New patchset: Dzahn; "comment out Swift HTTP monitoring on non-production hosts again, this used to work for a day now they socket timeout and they are not in production anyways" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3187 [04:59:49] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3187 [05:01:46] New patchset: Dzahn; "comment out Swift HTTP monitoring on non-production hosts again, this used to work for a day now they socket timeout, so i expect they have been stopped deliberately" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3187 [05:01:58] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3187 [05:02:25] New review: Dzahn; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/3187 [05:02:28] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3187 [05:20:30] New patchset: Asher; "vcl_config, not vcl_options" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3188 [05:20:43] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3188 [05:22:37] New review: Asher; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/3188 [05:22:40] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3188 [05:26:59] RECOVERY - Puppet freshness on professor is OK: puppet ran at Thu Mar 15 05:26:47 UTC 2012 [05:30:01] New patchset: Asher; "removing misc::udpprofile::collector from spence" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3189 [05:30:13] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3189 [05:30:22] New review: Asher; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/3189 [05:30:24] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3189 [05:52:20] PROBLEM - Swift HTTP on copper is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:01:17] RECOVERY - Puppet freshness on virt2 is OK: puppet ran at Thu Mar 15 06:01:10 UTC 2012 [06:04:44] RECOVERY - Puppet freshness on virt3 is OK: puppet ran at Thu Mar 15 06:04:20 UTC 2012 [06:10:17] RECOVERY - Puppet freshness on virt4 is OK: puppet ran at Thu Mar 15 06:10:11 UTC 2012 [06:23:56] PROBLEM - Swift HTTP on magnesium is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:18:05] PROBLEM - Swift HTTP on copper is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:20:47] PROBLEM - Puppet freshness on amslvs4 is CRITICAL: Puppet has not run in the last 10 hours [07:47:38] RECOVERY - DPKG on hume is OK: All packages OK [07:57:23] PROBLEM - Lucene on search9 is CRITICAL: Connection timed out [07:59:20] PROBLEM - Lucene on search3 is CRITICAL: Connection timed out [08:02:19] PROBLEM - LVS Lucene on search-pool1.svc.pmtpa.wmnet is CRITICAL: Connection timed out [08:06:13] RECOVERY - LVS Lucene on search-pool1.svc.pmtpa.wmnet is OK: TCP OK - 0.001 second response time on port 8123 [08:12:40] PROBLEM - LVS Lucene on search-pool1.svc.pmtpa.wmnet is CRITICAL: Connection timed out [08:18:40] RECOVERY - LVS Lucene on search-pool1.svc.pmtpa.wmnet is OK: TCP OK - 0.001 second response time on port 8123 [08:29:19] PROBLEM - LVS Lucene on search-pool1.svc.pmtpa.wmnet is CRITICAL: Connection timed out [08:33:13] RECOVERY - LVS Lucene on search-pool1.svc.pmtpa.wmnet is OK: TCP OK - 0.003 second response time on port 8123 [08:47:55] PROBLEM - LVS Lucene on search-pool1.svc.pmtpa.wmnet is CRITICAL: Connection timed out [08:53:55] RECOVERY - LVS Lucene on search-pool1.svc.pmtpa.wmnet is OK: TCP OK - 0.001 second response time on port 8123 [09:00:22] PROBLEM - LVS Lucene on search-pool1.svc.pmtpa.wmnet is CRITICAL: Connection timed out [09:04:16] RECOVERY - LVS Lucene on search-pool1.svc.pmtpa.wmnet is OK: TCP OK - 0.006 second response time on port 8123 [09:06:33] RECOVERY - Puppet freshness on mw1020 is OK: puppet ran at Thu Mar 15 09:06:26 UTC 2012 [09:12:42] PROBLEM - Swift HTTP on zinc is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:17:39] RECOVERY - Lucene on search3 is OK: TCP OK - 0.012 second response time on port 8123 [09:22:09] PROBLEM - LVS Lucene on search-pool1.svc.pmtpa.wmnet is CRITICAL: Connection timed out [09:23:57] RECOVERY - LVS Lucene on search-pool1.svc.pmtpa.wmnet is OK: TCP OK - 0.001 second response time on port 8123 [09:30:24] PROBLEM - Lucene on search3 is CRITICAL: Connection timed out [09:30:24] PROBLEM - LVS Lucene on search-pool1.svc.pmtpa.wmnet is CRITICAL: Connection timed out [09:32:12] RECOVERY - LVS Lucene on search-pool1.svc.pmtpa.wmnet is OK: TCP OK - 0.002 second response time on port 8123 [09:38:39] PROBLEM - LVS Lucene on search-pool1.svc.pmtpa.wmnet is CRITICAL: Connection timed out [09:40:36] RECOVERY - LVS Lucene on search-pool1.svc.pmtpa.wmnet is OK: TCP OK - 0.002 second response time on port 8123 [09:47:41] PROBLEM - LVS Lucene on search-pool1.svc.pmtpa.wmnet is CRITICAL: Connection timed out [09:57:53] PROBLEM - Lucene on mw1020 is CRITICAL: Connection refused [09:59:59] RECOVERY - LVS Lucene on search-pool1.svc.pmtpa.wmnet is OK: TCP OK - 0.019 second response time on port 8123 [10:22:56] PROBLEM - LVS Lucene on search-pool1.svc.pmtpa.wmnet is CRITICAL: Connection timed out [10:28:56] RECOVERY - LVS Lucene on search-pool1.svc.pmtpa.wmnet is OK: TCP OK - 0.002 second response time on port 8123 [10:39:35] RECOVERY - Lucene on search3 is OK: TCP OK - 0.011 second response time on port 8123 [10:39:44] PROBLEM - LVS Lucene on search-pool1.svc.pmtpa.wmnet is CRITICAL: Connection timed out [10:42:13] RECOVERY - Lucene on search9 is OK: TCP OK - 0.005 second response time on port 8123 [10:42:31] RECOVERY - LVS Lucene on search-pool1.svc.pmtpa.wmnet is OK: TCP OK - 0.002 second response time on port 8123 [10:46:34] PROBLEM - carbon-cache.py on spence is CRITICAL: PROCS CRITICAL: 0 processes with command name carbon-cache.py [11:17:37] PROBLEM - Host dataset1001 is DOWN: PING CRITICAL - Packet loss = 100% [11:18:10] yeah yeah ignore that [11:18:31] PROBLEM - Swift HTTP on zinc is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:18:40] PROBLEM - Swift HTTP on copper is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:22:43] PROBLEM - Swift HTTP on copper is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:22:52] PROBLEM - Swift HTTP on zinc is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:25:34] RECOVERY - Host dataset1001 is UP: PING OK - Packet loss = 0%, RTA = 26.43 ms [11:53:50] New patchset: ArielGlenn; "bonded interfaces for dataset1001" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3191 [11:54:02] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3191 [11:56:19] New review: ArielGlenn; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/3191 [11:56:28] New review: ArielGlenn; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/3191 [11:56:31] Change merged: ArielGlenn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3191 [11:58:51] PROBLEM - Swift HTTP on magnesium is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:23:36] PROBLEM - Swift HTTP on magnesium is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:31:51] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:32:14] New patchset: ArielGlenn; "mount gluster publicdata volume on dataset1001 (dumps)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3192 [12:32:26] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3192 [12:33:39] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:36:12] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 7.970 seconds [12:37:42] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 0.002 seconds [12:43:04] New review: ArielGlenn; "latency between dcs could be problematic for this but let's give it a try, it's only for copy/delete..." [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/3192 [12:43:06] Change merged: ArielGlenn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3192 [13:11:45] !log on screen as root on dataset1001, copying to gluster volume; if this causes problems feel free to shoot it. ( cp -a 20120211 /mnt/glusterpublicdata/public/enwiki/ ) [13:11:49] Logged the message, Master [13:14:39] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:14:57] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:21:06] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 8.350 seconds [13:26:57] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 5.490 seconds [13:27:42] PROBLEM - Puppet freshness on owa3 is CRITICAL: Puppet has not run in the last 10 hours [13:29:39] PROBLEM - Puppet freshness on amslvs2 is CRITICAL: Puppet has not run in the last 10 hours [13:33:24] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:34:00] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:35:21] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 9.888 seconds [13:36:42] PROBLEM - Puppet freshness on owa1 is CRITICAL: Puppet has not run in the last 10 hours [13:36:42] PROBLEM - Puppet freshness on owa2 is CRITICAL: Puppet has not run in the last 10 hours [13:40:18] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 9.564 seconds [13:40:41] New patchset: Mark Bergsma; "Don't sign builds by default" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3193 [13:40:53] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3193 [13:41:07] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3193 [13:41:10] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3193 [13:41:48] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:43:47] New patchset: Mark Bergsma; "Move misc::package-builder into a separate file" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3194 [13:44:00] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3194 [13:44:54] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3194 [13:44:57] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3194 [13:46:00] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 8.486 seconds [13:46:36] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:51:47] !log ariel synchronized wmf-config/InitialiseSettings.php 'emergency disable of feedback dashboard' [13:51:50] Logged the message, Master [13:52:09] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:02:08] !log ariel synchronized wmf-config/InitialiseSettings.php 'emergency disable of feedback dashboard (right config var this time?)' [14:02:11] Logged the message, Master [14:02:39] PROBLEM - Swift HTTP on copper is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:03:06] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 9.796 seconds [14:06:21] !log disabled moodbar temporarily on en wikii, see bug 35245 [14:06:24] Logged the message, Master [14:06:46] apergos: thanks! [14:07:42] New patchset: Mark Bergsma; "Puppetize pbuilder" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3195 [14:07:54] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3195 [14:08:19] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3195 [14:08:22] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3195 [14:09:16] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:19:55] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 6.770 seconds [14:24:23] New patchset: Mark Bergsma; "Fix dependency cycle" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3199 [14:24:34] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3199 [14:24:59] Change abandoned: Mark Bergsma; "this would merge test in again" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3199 [14:28:19] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:47:04] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 8.421 seconds [14:52:10] PROBLEM - Packetloss_Average on locke is CRITICAL: CRITICAL: packet_loss_average is 10.3677817857 (gt 8.0) [14:53:40] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:56:13] RECOVERY - Packetloss_Average on locke is OK: OK: packet_loss_average is 1.40850763158 [15:04:12] New patchset: Mark Bergsma; "Fix dependency cycle" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3202 [15:04:25] New patchset: Mark Bergsma; "Fix othermirrors, setup default dist link" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3203 [15:04:38] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3202 [15:04:38] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3203 [15:04:52] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3202 [15:04:55] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3202 [15:05:18] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3203 [15:05:20] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3203 [15:06:07] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 9.721 seconds [15:12:25] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:21:16] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 9.718 seconds [15:28:49] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:29:16] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 7.466 seconds [15:35:34] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:50:07] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 8.864 seconds [15:56:25] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 9.901 seconds [15:56:25] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:00:37] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 9.865 seconds [16:02:43] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:06:46] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:12:37] PROBLEM - Swift HTTP on copper is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:13:04] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 8.340 seconds [16:13:29] !log reedy synchronized php-1.19/extensions/WikimediaMaintenance/cleanupBug31576.php 'r113929' [16:13:33] Logged the message, Master [16:15:10] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 9.342 seconds [16:19:07] New patchset: Lcarr; "Fixing icinga apache file" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3204 [16:19:19] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3204 [16:19:39] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3204 [16:19:41] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3204 [16:21:28] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:21:37] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:35:34] New patchset: Lcarr; "Making sure all config files are readable" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3206 [16:35:46] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3206 [16:35:55] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3206 [16:35:58] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3206 [16:38:33] PROBLEM - check_job_queue on spence is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 9,999 jobs: , commonswiki (10283) [16:43:12] PROBLEM - Swift HTTP on zinc is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:43:21] PROBLEM - Swift HTTP on copper is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:44:51] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 9.589 seconds [16:53:15] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:53:33] PROBLEM - Swift HTTP on copper is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:53:42] PROBLEM - Swift HTTP on zinc is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:54:36] RECOVERY - check_job_queue on spence is OK: JOBQUEUE OK - all job queues below 10,000 [17:01:58] New patchset: Lcarr; "trying to make exported files world readable" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3207 [17:02:10] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3207 [17:02:29] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3207 [17:02:31] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3207 [17:06:24] New patchset: Lcarr; "fix perms and purge decommissioned AFTER collecting resources" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3208 [17:06:36] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3208 [17:07:03] PROBLEM - Packetloss_Average on locke is CRITICAL: CRITICAL: packet_loss_average is 9.09612973913 (gt 8.0) [17:07:41] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3208 [17:07:44] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3208 [17:13:12] RECOVERY - Packetloss_Average on locke is OK: OK: packet_loss_average is 0.538383529412 [17:16:29] !log reedy synchronized php-1.19/includes/SkinTemplate.php 'r113932' [17:16:32] Logged the message, Master [17:22:21] PROBLEM - Puppet freshness on amslvs4 is CRITICAL: Puppet has not run in the last 10 hours [17:22:30] PROBLEM - Swift HTTP on copper is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:23:08] New patchset: Lcarr; "fixing collection" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3210 [17:23:21] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3210 [17:24:19] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3210 [17:24:22] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3210 [17:28:07] New patchset: Mark Bergsma; "Build with source by default" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3211 [17:28:19] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3211 [17:29:32] Change abandoned: Mark Bergsma; "(no reason)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3211 [17:29:50] Change restored: Mark Bergsma; "(no reason)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3211 [17:29:58] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3211 [17:30:00] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3211 [17:31:15] !log reedy synchronized php-1.19/includes/ 'r113935' [17:31:18] Logged the message, Master [17:31:48] !log reedy synchronized php-1.19/resources/ 'r113935' [17:31:51] Logged the message, Master [17:32:24] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 7.425 seconds [17:32:24] !log reedy synchronized php-1.19/languages/messages/ 'r113935' [17:32:27] Logged the message, Master [17:37:01] !log reedy synchronized php-1.19/includes/specials/SpecialUndelete.php 'r113936' [17:37:04] Logged the message, Master [17:38:09] !log reedy synchronized php-1.19/resources/mediawiki/mediawiki.util.js 'r113936' [17:38:12] Logged the message, Master [17:38:33] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:40:18] New patchset: Mark Bergsma; "Fix creates file name" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3214 [17:40:31] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3214 [17:40:31] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3214 [17:40:42] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/3214 [17:40:45] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3214 [17:41:36] !log reedy synchronized php-1.19/includes/RecentChange.php 'r113938' [17:41:40] Logged the message, Master [17:52:43] !log reedy synchronized php-1.19/extensions/NewUserMessage/NewUserMessage.class.php 'r113940' [17:52:46] Logged the message, Master [17:53:35] !log reedy synchronized php-1.19/extensions/wikihiero/modules/ext.wikihiero.css 'r113940' [17:53:40] Logged the message, Master [17:54:37] New patchset: Mark Bergsma; "Temporarily disable varnish package installation during package name migration" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3215 [17:54:42] !log reedy synchronized php-1.19/extensions/CheckUser/ 'r113940' [17:54:45] Logged the message, Master [17:54:50] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3215 [17:55:04] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3215 [17:55:08] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3215 [17:55:26] !log reedy synchronized php-1.19/extensions/CentralAuth/ 'r113940' [17:55:29] Logged the message, Master [18:02:58] Why can't I view any results on edit filter 213 on the English Wikipedia? [18:03:23] was the "private" option changed so you cannot see results anymore? [18:15:06] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 7.680 seconds [18:15:06] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 7.569 seconds [18:18:15] RECOVERY - Lucene on search1015 is OK: TCP OK - 0.027 second response time on port 8123 [18:24:42] PROBLEM - Varnish HTTP bits on sq67 is CRITICAL: Connection refused [18:24:51] PROBLEM - Full LVS Snapshot on db1022 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [18:25:00] PROBLEM - MySQL Idle Transactions on db1022 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [18:25:18] PROBLEM - SSH on db1022 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:25:27] PROBLEM - MySQL Recent Restart on db1022 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [18:25:36] PROBLEM - MySQL Slave Delay on db1022 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [18:25:45] PROBLEM - MySQL Replication Heartbeat on db1022 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [18:26:12] PROBLEM - MySQL Slave Running on db1022 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [18:26:21] PROBLEM - Disk space on db1022 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [18:27:42] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:27:42] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:28:09] PROBLEM - Swift HTTP on zinc is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:28:45] PROBLEM - Swift HTTP on magnesium is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:34:10] New patchset: Ryan Lane; "Adding glusterfs cluster to gmetad" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3218 [18:34:22] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3218 [18:35:12] RECOVERY - Varnish HTTP bits on sq67 is OK: HTTP OK HTTP/1.1 200 OK - 632 bytes in 0.012 seconds [18:38:03] PROBLEM - Host sq67 is DOWN: PING CRITICAL - Packet loss = 100% [18:38:45] New patchset: RobH; "added sq39 to decom due to pci training error" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3219 [18:38:57] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3218 [18:38:58] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3218 [18:38:58] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3219 [18:39:12] New review: RobH; "simple decom addition" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/3219 [18:39:16] Change merged: RobH; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3219 [18:41:12] RECOVERY - Host sq67 is UP: PING OK - Packet loss = 0%, RTA = 0.49 ms [18:42:42] PROBLEM - Lucene on search1016 is CRITICAL: Connection refused [18:48:51] PROBLEM - Swift HTTP on zinc is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:49:45] PROBLEM - NTP on db1022 is CRITICAL: NTP CRITICAL: No response from NTP server [18:50:03] RECOVERY - SSH on db1022 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [18:51:22] New patchset: Mark Bergsma; "Make sq67-sq70 use the new automatic partitioning for varnish" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3221 [18:51:30] !log reedy synchronizing Wikimedia installation... : Running scap to deal with message changes earlier [18:51:33] Logged the message, Master [18:51:35] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3221 [18:51:44] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3221 [18:51:47] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3221 [18:57:24] PROBLEM - Disk space on srv223 is CRITICAL: DISK CRITICAL - free space: / 255 MB (3% inode=61%): /var/lib/ureadahead/debugfs 255 MB (3% inode=61%): [18:59:04] RECOVERY - Lucene on search1016 is OK: TCP OK - 0.027 second response time on port 8123 [19:00:51] RECOVERY - Disk space on db1022 is OK: DISK OK [19:02:03] RECOVERY - MySQL Recent Restart on db1022 is OK: OK seconds since restart [19:02:12] RECOVERY - MySQL Idle Transactions on db1022 is OK: OK longest blocking idle transaction sleeps for seconds [19:02:30] RECOVERY - MySQL Slave Delay on db1022 is OK: OK replication delay seconds [19:02:57] RECOVERY - MySQL Replication Heartbeat on db1022 is OK: OK replication delay seconds [19:03:15] RECOVERY - MySQL Slave Running on db1022 is OK: OK replication [19:03:33] RECOVERY - Full LVS Snapshot on db1022 is OK: OK no full LVM snapshot volumes [19:05:30] RECOVERY - Disk space on srv223 is OK: DISK OK [19:08:12] PROBLEM - Puppet freshness on mw1020 is CRITICAL: Puppet has not run in the last 10 hours [19:08:48] PROBLEM - Host sq68 is DOWN: PING CRITICAL - Packet loss = 100% [19:08:48] PROBLEM - Host sq70 is DOWN: PING CRITICAL - Packet loss = 100% [19:08:48] PROBLEM - Host sq69 is DOWN: PING CRITICAL - Packet loss = 100% [19:08:57] PROBLEM - Host sq67 is DOWN: PING CRITICAL - Packet loss = 100% [19:10:09] RECOVERY - Host sq69 is UP: PING OK - Packet loss = 0%, RTA = 0.32 ms [19:10:45] RECOVERY - Host sq70 is UP: PING OK - Packet loss = 0%, RTA = 0.48 ms [19:11:39] PROBLEM - LVS HTTP on bits.pmtpa.wikimedia.org is CRITICAL: Connection refused [19:12:24] RECOVERY - NTP on db1022 is OK: NTP OK: Offset -0.08507752419 secs [19:12:33] RECOVERY - Host sq68 is UP: PING OK - Packet loss = 0%, RTA = 1.69 ms [19:13:00] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 7.676 seconds [19:13:00] PROBLEM - LVS HTTPS on bits.pmtpa.wikimedia.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway [19:13:00] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 7.682 seconds [19:14:57] PROBLEM - Varnish HTTP bits on sq70 is CRITICAL: Connection refused [19:15:06] RECOVERY - Host sq67 is UP: PING OK - Packet loss = 0%, RTA = 0.58 ms [19:15:24] PROBLEM - Disk space on search1015 is CRITICAL: DISK CRITICAL - free space: /a 3398 MB (2% inode=99%): [19:15:51] PROBLEM - Varnish HTTP bits on sq69 is CRITICAL: Connection refused [19:17:28] PROBLEM - Varnish HTTP bits on sq68 is CRITICAL: Connection refused [19:17:46] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:18:31] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:18:49] PROBLEM - NTP on sq67 is CRITICAL: NTP CRITICAL: No response from NTP server [19:19:52] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 9.967 seconds [19:21:31] PROBLEM - Varnish HTTP bits on sq67 is CRITICAL: Connection refused [19:23:01] RECOVERY - LVS HTTP on bits.pmtpa.wikimedia.org is OK: HTTP OK HTTP/1.1 200 OK - 3911 bytes in 0.002 seconds [19:23:04] Nikerabbit: I see what you mean about scap hanging... [19:23:19] PROBLEM - SSH on sq67 is CRITICAL: Connection refused [19:23:28] RECOVERY - LVS HTTPS on bits.pmtpa.wikimedia.org is OK: HTTP OK HTTP/1.1 200 OK - 3928 bytes in 0.007 seconds [19:26:10] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:27:22] RECOVERY - SSH on sq67 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [19:29:19] PROBLEM - LVS HTTP on bits.pmtpa.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:29:46] PROBLEM - LVS HTTPS on bits.pmtpa.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:30:40] PROBLEM - SSH on sq68 is CRITICAL: Connection refused [19:33:40] PROBLEM - SSH on sq69 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:33:58] PROBLEM - Varnish HTTP bits on sq69 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:34:52] RECOVERY - SSH on sq68 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [19:34:52] PROBLEM - SSH on sq70 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:35:28] RECOVERY - SSH on sq69 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [19:36:49] RECOVERY - SSH on sq70 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [19:37:07] RECOVERY - Disk space on search1015 is OK: DISK OK [19:37:52] PROBLEM - Swift HTTP on magnesium is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:39:22] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 8.488 seconds [19:40:34] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 7.301 seconds [19:47:46] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:50:55] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:51:49] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 9.376 seconds [19:54:58] Hi there! I noticed the change of a language name on Wikipedia, and I'm wondering if there's a place where people are discussing how languages are spelled [19:55:22] The instance I noticed was changing the "Nnapolitanu" Wikipedia to be spelled like "Nnapolitano" instead [19:55:31] blumenkraft, yes but not one you can really have hope to get something fixed [19:55:43] PROBLEM - NTP on sq68 is CRITICAL: NTP CRITICAL: No response from NTP server [19:55:47] blumenkraft, https://translatewiki.net/wiki/CLDR [19:55:57] I am wondering where the discussion would take place to decide such a name switch [19:56:06] blumenkraft, te l'ho appena detto :) [19:56:17] *I just told you [19:57:03] Ah okay, didn't know about CLDR [19:57:32] grazie [19:57:58] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:01:07] New patchset: Lcarr; "adding in all old nagios groups to purge" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3223 [20:01:19] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3223 [20:01:25] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 9.335 seconds [20:03:59] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3223 [20:04:01] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3223 [20:06:13] PROBLEM - NTP on sq69 is CRITICAL: NTP CRITICAL: No response from NTP server [20:07:34] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:07:34] PROBLEM - NTP on sq70 is CRITICAL: NTP CRITICAL: No response from NTP server [20:16:43] RECOVERY - Varnish HTTP bits on sq67 is OK: HTTP OK HTTP/1.1 200 OK - 630 bytes in 0.003 seconds [20:18:04] PROBLEM - Swift HTTP on zinc is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:23:19] RECOVERY - LVS HTTPS on bits.pmtpa.wikimedia.org is OK: HTTP OK HTTP/1.1 200 OK - 3916 bytes in 9.013 seconds [20:23:28] RECOVERY - Varnish HTTP bits on sq69 is OK: HTTP OK HTTP/1.1 200 OK - 630 bytes in 0.009 seconds [20:24:49] PROBLEM - Varnish HTTP bits on sq67 is CRITICAL: Connection refused [20:26:46] RECOVERY - NTP on sq69 is OK: NTP OK: Offset -0.04886293411 secs [20:26:55] RECOVERY - Varnish HTTP bits on sq67 is OK: HTTP OK HTTP/1.1 200 OK - 630 bytes in 0.007 seconds [20:27:22] RECOVERY - LVS HTTP on bits.pmtpa.wikimedia.org is OK: HTTP OK HTTP/1.1 200 OK - 3910 bytes in 0.009 seconds [20:28:43] RECOVERY - Varnish HTTP bits on sq70 is OK: HTTP OK HTTP/1.1 200 OK - 632 bytes in 0.002 seconds [20:32:10] RECOVERY - NTP on sq70 is OK: NTP OK: Offset -0.021941185 secs [20:36:22] RECOVERY - Varnish HTTP bits on sq68 is OK: HTTP OK HTTP/1.1 200 OK - 632 bytes in 0.007 seconds [20:37:02] !log reedy synchronized php-1.19/extensions/WikimediaMaintenance/cleanupBug31576.php [20:37:05] Logged the message, Master [20:43:25] New patchset: Lcarr; "correcting service group config" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3224 [20:43:38] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3224 [20:43:45] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3224 [20:43:47] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3224 [20:47:22] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 5.756 seconds [20:47:40] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 6.439 seconds [20:50:13] PROBLEM - Auth DNS on ns2.wikimedia.org is CRITICAL: CRITICAL - Plugin timed out while executing system call [20:53:40] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:54:07] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:09:16] !log catrope synchronized php-1.19/extensions/ArticleFeedbackv5/ 'r113957' [21:09:19] Logged the message, Master [21:11:40] !log catrope synchronized php-1.19/extensions/ArticleFeedback/modules/ext.articleFeedback/ext.articleFeedback.js 'r113958' [21:11:44] Logged the message, Master [21:12:16] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 9.500 seconds [21:14:58] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 9.811 seconds [21:17:16] anyone awake in here? [21:18:47] !log catrope synchronized php-1.19/extensions/ArticleFeedback/modules/ext.articleFeedback/ext.articleFeedback.js [21:18:50] Logged the message, Master [21:18:59] !log That was r113959 [21:19:02] Logged the message, Mr. Obvious [21:19:52] anyone know why we're down? [21:19:56] i can't save edits [21:20:26] !log catrope synchronized wmf-config/CommonSettings.php 'Bump AFTv4 event logging percentage from 0.27% to 1%' [21:20:28] TenPoundHammer, where and with what error [21:20:29] Logged the message, Master [21:20:33] TenPoundHammer: What error message do you get? [21:20:40] the generic wikimedia foundation error [21:20:44] "Our servers are currently experiencing a technical problem. This is probably temporary and should be fixed soon. Please try again in a few minutes." [21:20:45] that one [21:20:45] On save it gives generic error [21:20:48] Is there any small text below? [21:20:50] PHP fatal error in /usr/local/apache/common-local/php-1.19/extensions/ArticleFeedbackv5/ArticleFeedbackv5.hooks.php line 421: [21:20:51] Call to undefined method EditPage::getRevIdFetched() [21:20:55] Oh fuck [21:20:57] This is my fault [21:20:58] PHP fatal error in /usr/local/apache/common-local/php-1.19/extensions/ArticleFeedbackv5/ArticleFeedbackv5.hooks.php line 421: [21:20:58] Call to undefined method EditPage::getRevIdFetched() [21:20:59] Fixing [21:21:03] lol [21:21:55] !log catrope synchronized php-1.19/extensions/ArticleFeedbackv5/ArticleFeedbackv5.hooks.php 'fix fatal' [21:21:58] Logged the message, Master [21:22:03] OK should be fixed now [21:22:27] yay [21:22:38] yep [21:22:54] confirmed, fixed. [21:24:34] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:25:20] !log disabled credit cards on donate.wikimedia.org [21:25:23] Logged the message, Master [21:25:56] !log K4-713 synchronized payments cluster to r113956 [21:26:00] Logged the message, Master [21:28:03] !log catrope synchronized php-1.19/extensions/ArticleFeedbackv5/ArticleFeedbackv5.hooks.php 'r113961' [21:28:06] Logged the message, Master [21:28:37] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 4.208 seconds [21:39:41] Reedy: did you figure out the reason? [21:41:29] Evening guys. Would it be yourselves I would approach with issues on the en mobile site? [21:43:02] BarkingFish: or #wikimedia-mobile [21:43:05] is broken? [21:43:11] part of it is, yes [21:43:45] I have an HTC Wizard running WM5, with Internet Explorer - I went to the link today to permanently disable the mobile site, and it's not clickable [21:44:02] !log catrope synchronized php-1.19/resources/startup.js 'touch' [21:44:06] Logged the message, Master [21:45:43] PROBLEM - Mobile WAP site on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:46:55] anyone in here know about coding on the Chinese Wikipedia [21:47:07] have a user in Help asking about a template [21:47:12] inline cite [21:47:13] ? [21:47:49] RECOVERY - Mobile WAP site on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 1642 bytes in 7.461 seconds [21:49:08] New patchset: Bhartshorne; "increasing speed of the swiftcleaner so it has a chance to finish its scan in a reasonable amount of time" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3226 [21:49:21] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3226 [21:49:24] New review: Bhartshorne; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/3226 [21:49:26] Change merged: Bhartshorne; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3226 [21:53:38] New patchset: Bhartshorne; "attempt to get a timestamp into the swiftcleaner log name" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3227 [21:53:50] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3227 [21:54:34] Nikerabbit: nope,not had chance to poke at it further [21:54:52] hahahahaha...toolserver says the WP is not a valid wiki... [22:04:54] !log catrope synchronized wmf-config/CommonSettings.php 'Raise AFTv4 event logging percentage from 1% to 5%' [22:04:57] Logged the message, Master [22:05:49] PROBLEM - HTTP on ekrem is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:07:22] !log catrope synchronized php-1.19/resources/startup.js 'touch' [22:07:26] Logged the message, Master [22:09:43] RECOVERY - HTTP on ekrem is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 4.260 seconds [22:09:53] Shearonink, isn't one of the toolserver servers still dead? [22:09:59] BarkingFish: that's for #wikimedia-mobile [22:10:03] BarkingFish: 2 of them [22:10:04] I don;t know [22:10:17] you didn't ask clarissa? [22:10:19] all i know is that toolserver says that Wp is not a valid wiki [22:10:29] last I checked [22:10:32] Shearonink: which wiki? [22:10:45] WIKIPEDIA [22:10:57] toolserver says that WP is not a valid wiki. [22:11:20] http://lists.wikimedia.org/pipermail/toolserver-l/2012-March/004828.html [22:11:27] there is no wiki "WP" nor is there a wiki "WIKIPEDIA" in our database. There is enwiki for example [22:15:56] DaBPunkt: run a toolserver edit count on someone from English WP [22:16:06] All I can tell you is what I saw on the page [22:16:11] Shearonink: url? [22:16:25] http://toolserver.org/~tparis/pcount/index.php?name=Hoppingalong&lang=en&wiki=wikipedia [22:16:39] en.wikipedia.org is not a valid wiki [22:16:43] ^^ [22:16:54] speak to tparis [22:17:12] afaik those are copies of soxred93's tools [22:17:16] his account expired [22:17:29] I{'m just mentioning it in here...thought someone would want to know. [22:19:33] The Wikipedia mirros site is down...didn;tknow that, sorry if I missed it. [22:21:48] Maybe someone could update the greeting lines at the top so you tech experts don't have to keep answering the same questions about toolserver. [22:23:02] !log catrope synchronized wmf-config/CommonSettings.php 'Raise AFTv4 event logging percentage from 5% to 25%' [22:23:05] Logged the message, Master [22:23:36] Shearonink: there is a extra toolserver-channel at #wikimedia-toolserver [22:23:38] Shearonink: This is not the toolserver channel, and this problem is not a common one AFAIK [22:24:24] !log catrope synchronized php-1.19/resources/startup.js 'touch' [22:24:27] Logged the message, Master [22:24:56] ok, fine, sorry if I misunderstod the channel's parameters. [22:26:43] jeremyb, I'll refer that issue I mentioned to you, onto #wikimedia-mobile, see if anyone else has the same issue [22:26:44] No worries [22:29:59] RoanKattouw: Hey [22:33:35] [[Tech]]; MZMcBride; /* Query Limit from mobile or web Client */ updated section; https://meta.wikimedia.org/w/index.php?diff=3571844&oldid=3569666&rcid=3186146 [22:34:48] New patchset: Pyoungmeister; "no lucene monitoring for indexers, as it does not work properly..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3228 [22:35:01] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3228 [22:36:21] New review: Dzahn; "yep, thanks!" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/3228 [22:36:24] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3228 [22:40:08] RoanKattouw: I think gerrit-wm should be quieted in here. [22:40:44] +1 [22:41:15] also, is there a way to ignore a user only on a single channel? [22:41:40] oh, snitch [22:42:43] PROBLEM - MySQL Slave Delay on db1047 is CRITICAL: CRIT replication delay 205 seconds [22:44:37] New review: Bhartshorne; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/3227 [22:44:40] Change merged: Bhartshorne; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3227 [22:46:28] PROBLEM - MySQL Replication Heartbeat on db1047 is CRITICAL: CRIT replication delay 351 seconds [22:52:01] Joan: Get Ryan to tell it to not report shit here [22:52:57] I'll file a bug. [22:56:15] https://bugzilla.wikimedia.org/show_bug.cgi?id=35248 [22:57:38] !log catrope synchronized php-1.19/extensions/ArticleFeedback/modules/jquery.articleFeedback/jquery.articleFeedback.js [22:57:41] Logged the message, Master [22:59:05] !log catrope synchronized wmf-config/CommonSettings.php 'Bump AFTv4 event logging percentage from 25% to 100%' [22:59:08] Logged the message, Master [22:59:39] !log catrope synchronized php-1.19/resources/startup.js 'touch' [22:59:42] Logged the message, Master [23:05:07] RECOVERY - MySQL Replication Heartbeat on db1047 is OK: OK replication delay 0 seconds [23:05:34] RECOVERY - MySQL Slave Delay on db1047 is OK: OK replication delay 0 seconds [23:10:53] !log catrope synchronized php-1.19/extensions/ArticleFeedbackv5/modules/ext.articleFeedbackv5/ext.articleFeedbackv5.js 'r113972' [23:10:56] Logged the message, Master [23:12:21] !log awjrichards synchronized php/extensions/MobileFrontend/templates/DisableTemplate.php 'r113973, fixes bug 35249' [23:12:24] Logged the message, Master [23:12:28] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:13:49] PROBLEM - MySQL Replication Heartbeat on db42 is CRITICAL: CRIT replication delay 199 seconds [23:14:25] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 4.684 seconds [23:14:43] PROBLEM - MySQL Slave Delay on db42 is CRITICAL: CRIT replication delay 234 seconds [23:17:33] !log catrope synchronized php-1.19/extensions/ArticleFeedbackv5/ArticleFeedbackv5.hooks.php 'r113974' [23:17:37] Logged the message, Master [23:21:38] New patchset: Ryan Lane; "Removing gerrit bot from wikimedia-tech" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3230 [23:21:51] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3230 [23:21:58] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3230 [23:22:01] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3230 [23:28:49] PROBLEM - Puppet freshness on owa3 is CRITICAL: Puppet has not run in the last 10 hours [23:30:46] PROBLEM - Puppet freshness on amslvs2 is CRITICAL: Puppet has not run in the last 10 hours [23:36:19] RECOVERY - MySQL Replication Heartbeat on db42 is OK: OK replication delay 4 seconds [23:37:13] RECOVERY - MySQL Slave Delay on db42 is OK: OK replication delay 1 seconds [23:37:49] PROBLEM - Puppet freshness on owa1 is CRITICAL: Puppet has not run in the last 10 hours [23:37:49] PROBLEM - Puppet freshness on owa2 is CRITICAL: Puppet has not run in the last 10 hours [23:49:13] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:53:16] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 7.034 seconds