[00:29:14] PROBLEM - Puppet freshness on db1033 is CRITICAL: Puppet has not run in the last 10 hours [00:33:17] PROBLEM - Puppet freshness on virt4 is CRITICAL: Puppet has not run in the last 10 hours [00:37:20] PROBLEM - Puppet freshness on virt3 is CRITICAL: Puppet has not run in the last 10 hours [00:42:17] PROBLEM - Puppet freshness on virt1 is CRITICAL: Puppet has not run in the last 10 hours [00:46:01] Expensive parser function count: 919/500 [00:46:09] When did this decrease from 2000? [00:52:20] PROBLEM - Puppet freshness on virt2 is CRITICAL: Puppet has not run in the last 10 hours [01:02:52] Joan: what do you mean? [01:04:21] It used to be 2000. Now it's 500. [01:12:41] Joan: where? [01:12:55] On the English Wikipedia. [01:14:46] It's the same everywhere [01:15:43] Joan: it's been that way since July 2011 [01:16:04] Lame. [01:16:09] at least [01:51:44] New patchset: Dzahn; "put swift production servers into a nagios hostgroup" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3067 [01:51:56] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3067 [01:53:23] New review: Dzahn; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/3067 [01:53:26] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3067 [02:00:58] New review: Dzahn; "looks like this has been done manually meanwhile" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2927 [02:01:01] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2927 [02:12:27] Nemo_bis: around? [02:13:21] New patchset: Dzahn; "oops, what did i do there, fix the exec for nagios to exim group" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3068 [02:13:33] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3068 [02:15:28] New patchset: Dzahn; "oops, what did i do there, fix the exec for nagios to exim group, meh, get rid of whitespace too" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3068 [02:15:40] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3068 [02:16:09] New review: Dzahn; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/3068 [02:16:12] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3068 [02:17:28] !log LocalisationUpdate completed (1.19) at Mon Mar 12 02:17:28 UTC 2012 [02:17:33] Logged the message, Master [02:22:04] did a bot just ask another bot to log a message? [02:22:42] New review: Dzahn; "(no comment)" [operations/puppet] (production); V: 0 C: 1; - https://gerrit.wikimedia.org/r/2989 [02:23:45] New review: Dzahn; "(no comment)" [operations/puppet] (production); V: 1 C: 1; - https://gerrit.wikimedia.org/r/3024 [02:24:28] JRWR: not rare [02:24:50] why is Danny_B|backup awake? ;) [02:25:17] insomnia [02:25:22] New review: Dzahn; "(no comment)" [operations/puppet] (production); V: 1 C: 1; - https://gerrit.wikimedia.org/r/3036 [02:25:38] Danny_B|backup: schlaf! [02:26:26] insomnia please release me and let me dream... [02:34:47] Danny_B|backup: I'm sorry :/ [02:49:06] New patchset: Dzahn; "swift prod servers to monitoring group, that didnt appear to work because they dont include base" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3069 [02:49:18] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3069 [02:50:06] New review: Dzahn; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/3069 [02:50:09] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3069 [03:01:33] New patchset: Dzahn; "meh, duplicate def from including standard, add nagios hostgroup in base definition then and do this in role::swift" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3070 [03:01:45] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3070 [03:03:03] New review: Dzahn; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/3070 [03:03:06] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3070 [03:33:56] PROBLEM - Puppet freshness on mw1110 is CRITICAL: Puppet has not run in the last 10 hours [03:33:56] PROBLEM - Puppet freshness on mw1020 is CRITICAL: Puppet has not run in the last 10 hours [04:19:16] RECOVERY - DPKG on spence is OK: All packages OK [04:19:34] RECOVERY - Disk space on spence is OK: DISK OK [04:20:01] RECOVERY - profiler-to-carbon on spence is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/udpprofile/sbin/profiler-to-carbon [04:20:19] RECOVERY - profiling collector on spence is OK: PROCS OK: 1 process with command name collector [04:20:19] RECOVERY - RAID on spence is OK: OK: no RAID installed [04:56:47] New patchset: Dzahn; "change Swift HTTP check to user port 8080" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3071 [04:56:59] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3071 [04:59:21] New review: Dzahn; "should be able to check them all with the same command, when using port 8080 " [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/3071 [04:59:24] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3071 [05:04:13] PROBLEM - Puppet freshness on db1004 is CRITICAL: Puppet has not run in the last 10 hours [05:14:16] PROBLEM - Puppet freshness on owa3 is CRITICAL: Puppet has not run in the last 10 hours [05:15:10] PROBLEM - Puppet freshness on amssq40 is CRITICAL: Puppet has not run in the last 10 hours [05:15:10] PROBLEM - Puppet freshness on knsq23 is CRITICAL: Puppet has not run in the last 10 hours [05:16:13] PROBLEM - Puppet freshness on amssq49 is CRITICAL: Puppet has not run in the last 10 hours [05:16:13] PROBLEM - Puppet freshness on amslvs2 is CRITICAL: Puppet has not run in the last 10 hours [05:16:13] PROBLEM - Puppet freshness on amssq56 is CRITICAL: Puppet has not run in the last 10 hours [05:19:13] PROBLEM - Puppet freshness on knsq24 is CRITICAL: Puppet has not run in the last 10 hours [05:19:13] PROBLEM - Puppet freshness on ssl3003 is CRITICAL: Puppet has not run in the last 10 hours [05:19:13] PROBLEM - Puppet freshness on knsq21 is CRITICAL: Puppet has not run in the last 10 hours [05:19:13] PROBLEM - Puppet freshness on ms6 is CRITICAL: Puppet has not run in the last 10 hours [05:20:16] PROBLEM - Puppet freshness on amssq62 is CRITICAL: Puppet has not run in the last 10 hours [05:23:33] PROBLEM - Puppet freshness on owa1 is CRITICAL: Puppet has not run in the last 10 hours [05:23:33] PROBLEM - Puppet freshness on owa2 is CRITICAL: Puppet has not run in the last 10 hours [05:24:07] New patchset: Dzahn; "..or not. make check_http_swift flexible for different ports and add an if-statement on hostname" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3072 [05:24:19] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3072 [05:26:05] New patchset: Dzahn; "make check_http_swift flexible for different ports and add an if-statement on hostname" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3072 [05:26:17] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3072 [05:27:31] New review: Dzahn; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/3072 [05:27:34] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3072 [05:32:51] PROBLEM - Puppet freshness on ms-be5 is CRITICAL: Puppet has not run in the last 10 hours [05:48:00] PROBLEM - SSH on copper is CRITICAL: Connection refused [05:49:21] PROBLEM - Memcached on copper is CRITICAL: Connection refused [05:51:35] RECOVERY - Swift HTTP on magnesium is OK: HTTP OK HTTP/1.1 200 OK - 2359 bytes in 1.064 seconds [05:51:53] PROBLEM - Puppet freshness on amslvs1 is CRITICAL: Puppet has not run in the last 10 hours [05:51:53] PROBLEM - Puppet freshness on amssq31 is CRITICAL: Puppet has not run in the last 10 hours [05:52:29] RECOVERY - SSH on copper is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [05:53:50] RECOVERY - Memcached on copper is OK: TCP OK - 0.027 second response time on port 11211 [05:58:11] PROBLEM - Puppet freshness on amslvs3 is CRITICAL: Puppet has not run in the last 10 hours [05:58:11] PROBLEM - Puppet freshness on amssq38 is CRITICAL: Puppet has not run in the last 10 hours [05:58:11] PROBLEM - Puppet freshness on amssq52 is CRITICAL: Puppet has not run in the last 10 hours [05:58:11] PROBLEM - Puppet freshness on cp3001 is CRITICAL: Puppet has not run in the last 10 hours [05:58:11] PROBLEM - Puppet freshness on knsq17 is CRITICAL: Puppet has not run in the last 10 hours [05:58:11] PROBLEM - Puppet freshness on amssq35 is CRITICAL: Puppet has not run in the last 10 hours [05:58:11] PROBLEM - Puppet freshness on amssq50 is CRITICAL: Puppet has not run in the last 10 hours [05:58:12] PROBLEM - Puppet freshness on amssq58 is CRITICAL: Puppet has not run in the last 10 hours [05:58:12] PROBLEM - Puppet freshness on amssq41 is CRITICAL: Puppet has not run in the last 10 hours [05:58:13] PROBLEM - Puppet freshness on knsq25 is CRITICAL: Puppet has not run in the last 10 hours [05:58:13] PROBLEM - Puppet freshness on knsq29 is CRITICAL: Puppet has not run in the last 10 hours [06:07:02] PROBLEM - Puppet freshness on amslvs4 is CRITICAL: Puppet has not run in the last 10 hours [06:07:02] PROBLEM - Puppet freshness on amssq33 is CRITICAL: Puppet has not run in the last 10 hours [06:07:02] PROBLEM - Puppet freshness on amssq36 is CRITICAL: Puppet has not run in the last 10 hours [06:07:02] PROBLEM - Puppet freshness on amssq53 is CRITICAL: Puppet has not run in the last 10 hours [06:07:02] PROBLEM - Puppet freshness on amssq54 is CRITICAL: Puppet has not run in the last 10 hours [06:07:02] PROBLEM - Puppet freshness on amssq59 is CRITICAL: Puppet has not run in the last 10 hours [06:07:02] PROBLEM - Puppet freshness on amssq39 is CRITICAL: Puppet has not run in the last 10 hours [06:07:03] PROBLEM - Puppet freshness on amssq44 is CRITICAL: Puppet has not run in the last 10 hours [06:07:03] PROBLEM - Puppet freshness on amssq51 is CRITICAL: Puppet has not run in the last 10 hours [06:07:04] PROBLEM - Puppet freshness on amssq55 is CRITICAL: Puppet has not run in the last 10 hours [06:07:04] PROBLEM - Puppet freshness on amssq60 is CRITICAL: Puppet has not run in the last 10 hours [06:07:05] PROBLEM - Puppet freshness on knsq26 is CRITICAL: Puppet has not run in the last 10 hours [06:07:05] PROBLEM - Puppet freshness on maerlant is CRITICAL: Puppet has not run in the last 10 hours [06:07:06] PROBLEM - Puppet freshness on ssl3004 is CRITICAL: Puppet has not run in the last 10 hours [06:08:05] PROBLEM - Puppet freshness on amssq32 is CRITICAL: Puppet has not run in the last 10 hours [06:08:05] PROBLEM - Puppet freshness on knsq27 is CRITICAL: Puppet has not run in the last 10 hours [06:08:05] PROBLEM - Puppet freshness on knsq28 is CRITICAL: Puppet has not run in the last 10 hours [06:08:05] PROBLEM - Puppet freshness on amssq47 is CRITICAL: Puppet has not run in the last 10 hours [06:08:05] PROBLEM - Puppet freshness on amssq45 is CRITICAL: Puppet has not run in the last 10 hours [06:08:05] PROBLEM - Puppet freshness on amssq48 is CRITICAL: Puppet has not run in the last 10 hours [06:11:05] PROBLEM - Puppet freshness on amssq34 is CRITICAL: Puppet has not run in the last 10 hours [06:11:05] PROBLEM - Puppet freshness on amssq37 is CRITICAL: Puppet has not run in the last 10 hours [06:11:05] PROBLEM - Puppet freshness on amssq42 is CRITICAL: Puppet has not run in the last 10 hours [06:11:05] PROBLEM - Puppet freshness on cp3002 is CRITICAL: Puppet has not run in the last 10 hours [06:11:05] PROBLEM - Puppet freshness on knsq16 is CRITICAL: Puppet has not run in the last 10 hours [06:11:05] PROBLEM - Puppet freshness on knsq18 is CRITICAL: Puppet has not run in the last 10 hours [06:11:05] PROBLEM - Puppet freshness on amssq57 is CRITICAL: Puppet has not run in the last 10 hours [06:11:06] PROBLEM - Puppet freshness on ssl3002 is CRITICAL: Puppet has not run in the last 10 hours [06:11:06] PROBLEM - Puppet freshness on ssl3001 is CRITICAL: Puppet has not run in the last 10 hours [06:11:07] PROBLEM - Puppet freshness on knsq19 is CRITICAL: Puppet has not run in the last 10 hours [06:12:08] PROBLEM - Puppet freshness on amssq46 is CRITICAL: Puppet has not run in the last 10 hours [06:13:11] PROBLEM - Puppet freshness on hooft is CRITICAL: Puppet has not run in the last 10 hours [06:13:11] PROBLEM - Puppet freshness on knsq22 is CRITICAL: Puppet has not run in the last 10 hours [06:14:05] PROBLEM - Puppet freshness on amssq43 is CRITICAL: Puppet has not run in the last 10 hours [06:14:05] PROBLEM - Puppet freshness on amssq61 is CRITICAL: Puppet has not run in the last 10 hours [06:14:05] PROBLEM - Puppet freshness on nescio is CRITICAL: Puppet has not run in the last 10 hours [06:16:02] PROBLEM - Puppet freshness on knsq20 is CRITICAL: Puppet has not run in the last 10 hours [06:27:27] RECOVERY - Swift HTTP on copper is OK: HTTP OK HTTP/1.1 200 OK - 2359 bytes in 0.127 seconds [06:28:48] RECOVERY - Swift HTTP on zinc is OK: HTTP OK HTTP/1.1 200 OK - 2359 bytes in 0.065 seconds [07:10:20] Any ops available and free now? [07:32:39] Hydriz: TimStarling may be around, Perhaps if you can mention why you want some ops help someone will ping you when they are around [07:32:55] not really important [07:33:07] just hoping that someone can enable the WikiLove extension on Incubator [07:33:19] and do the configuration change of the Incubator fast [07:33:41] Bugs 35161 and 31209 [08:05:20] New review: Dzahn; "i changed this command in change 3072 to fix monitoring on zinc/magnesium/copper who need port 8080 ..." [operations/puppet] (production); V: 1 C: 1; - https://gerrit.wikimedia.org/r/3036 [10:30:29] PROBLEM - Puppet freshness on db1033 is CRITICAL: Puppet has not run in the last 10 hours [10:34:32] PROBLEM - Puppet freshness on virt4 is CRITICAL: Puppet has not run in the last 10 hours [10:38:35] PROBLEM - Puppet freshness on virt3 is CRITICAL: Puppet has not run in the last 10 hours [10:43:32] PROBLEM - Puppet freshness on virt1 is CRITICAL: Puppet has not run in the last 10 hours [10:53:35] PROBLEM - Puppet freshness on virt2 is CRITICAL: Puppet has not run in the last 10 hours [11:14:44] PROBLEM - MySQL Slave Delay on db1018 is CRITICAL: CRIT replication delay 181 seconds [11:16:05] PROBLEM - MySQL Replication Heartbeat on db1018 is CRITICAL: CRIT replication delay 184 seconds [11:24:20] RECOVERY - MySQL Replication Heartbeat on db1018 is OK: OK replication delay 0 seconds [11:24:47] RECOVERY - MySQL Slave Delay on db1018 is OK: OK replication delay 0 seconds [11:28:56] New patchset: Mark Bergsma; "Support multiple varnish instances in the ganglia metrics module" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3081 [11:29:07] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3081 [11:30:35] New patchset: Mark Bergsma; "Support multiple varnish instances in the ganglia metrics module" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3081 [11:30:47] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3081 [11:34:52] New patchset: Mark Bergsma; "Support multiple varnish instances in the ganglia metrics module" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3081 [11:35:04] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3081 [11:36:41] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3081 [11:36:43] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3081 [11:37:27] PROBLEM - Puppet freshness on mw53 is CRITICAL: Puppet has not run in the last 10 hours [12:03:05] New patchset: Mark Bergsma; "Small fixes" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3082 [12:03:17] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3082 [12:03:46] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3082 [12:03:49] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3082 [12:17:15] New patchset: Mark Bergsma; "Monitor both frontend and backend varnish instances for mobile as well" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3083 [12:17:27] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3083 [12:17:46] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3083 [12:17:49] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3083 [12:19:35] New patchset: Mark Bergsma; "Add eqiad upload varnish cluster to torrus collection" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3084 [12:19:47] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3084 [12:19:56] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3084 [12:19:59] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3084 [12:21:58] New review: Mark Bergsma; "Please do per-data center groups, following the same name style as has been done for squids/varnish ..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3067 [13:28:08] could anyone with shell access determine the current version pf pdftotext on our servers? https://bugzilla.wikimedia.org/show_bug.cgi?id=35122 [13:35:06] PROBLEM - Puppet freshness on mw1110 is CRITICAL: Puppet has not run in the last 10 hours [13:35:06] PROBLEM - Puppet freshness on mw1020 is CRITICAL: Puppet has not run in the last 10 hours [14:25:54] Nemo_bis: 3.02 [14:33:08] New patchset: Mark Bergsma; "Support dual-layer varnish clusters" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3086 [14:33:20] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3086 [14:34:28] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3086 [14:34:31] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3086 [14:46:43] Reedy, hmmm – thanks [14:48:20] rather, as expected [14:49:49] New patchset: Mark Bergsma; "Support dual layer varnish setups, add upload-eqiad" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3087 [14:50:01] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3087 [14:51:37] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3087 [14:51:40] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3087 [14:54:46] New patchset: Mark Bergsma; "Missing %" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3088 [14:54:57] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3088 [14:55:01] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/3088 [14:55:03] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3088 [14:56:12] how fast does the full text search get updated, after new text is added? [14:56:51] is there a queue length for that? or is it reindexed at regular intervals? [14:57:23] regular intervals [14:57:29] PROBLEM - Packetloss_Average on locke is CRITICAL: CRITICAL: packet_loss_average is 11.8116703571 (gt 8.0) [14:57:38] daily? twice-weekly? [14:57:51] some things are done daily I believe, full rebuild once a week [14:57:53] something like that [14:58:19] is it possible to dig out some information from that search engine, e.g. which are the most common words? [14:58:40] no idea [14:58:43] or which words are considered to match which ones? [14:58:53] is it solr? [14:59:03] it's lucene [14:59:07] right [14:59:43] for Wikisource, if I proofread a book, it could be useful to get updated stats on which words are present in that book (or a group of books), to see which ones are errors [15:00:51] if lucene can dump its internal index, that dump could be published on dumps.wikimedia.org [15:01:38] New patchset: Mark Bergsma; "Set prefix to empty for single layer servers, add a decommission option for dual layer" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3089 [15:01:50] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3089 [15:02:07] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3089 [15:02:10] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3089 [15:05:35] PROBLEM - Puppet freshness on db1004 is CRITICAL: Puppet has not run in the last 10 hours [15:09:10] New patchset: Mark Bergsma; "Merge squidlayer and varnishlayer" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3090 [15:09:22] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3090 [15:09:56] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3090 [15:09:59] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3090 [15:15:38] PROBLEM - Puppet freshness on owa3 is CRITICAL: Puppet has not run in the last 10 hours [15:16:41] PROBLEM - Puppet freshness on amssq40 is CRITICAL: Puppet has not run in the last 10 hours [15:16:41] PROBLEM - Puppet freshness on knsq23 is CRITICAL: Puppet has not run in the last 10 hours [15:17:35] PROBLEM - Puppet freshness on amssq49 is CRITICAL: Puppet has not run in the last 10 hours [15:17:35] PROBLEM - Puppet freshness on amslvs2 is CRITICAL: Puppet has not run in the last 10 hours [15:17:35] PROBLEM - Puppet freshness on amssq56 is CRITICAL: Puppet has not run in the last 10 hours [15:20:35] PROBLEM - Puppet freshness on knsq21 is CRITICAL: Puppet has not run in the last 10 hours [15:20:35] PROBLEM - Puppet freshness on ms6 is CRITICAL: Puppet has not run in the last 10 hours [15:20:35] PROBLEM - Puppet freshness on ssl3003 is CRITICAL: Puppet has not run in the last 10 hours [15:20:35] PROBLEM - Puppet freshness on knsq24 is CRITICAL: Puppet has not run in the last 10 hours [15:21:38] PROBLEM - Puppet freshness on amssq62 is CRITICAL: Puppet has not run in the last 10 hours [15:22:41] PROBLEM - Host manutius is DOWN: PING CRITICAL - Packet loss = 100% [15:24:38] PROBLEM - Puppet freshness on owa1 is CRITICAL: Puppet has not run in the last 10 hours [15:24:38] PROBLEM - Puppet freshness on owa2 is CRITICAL: Puppet has not run in the last 10 hours [15:24:38] RECOVERY - Host manutius is UP: PING OK - Packet loss = 0%, RTA = 0.18 ms [15:34:41] PROBLEM - Puppet freshness on ms-be5 is CRITICAL: Puppet has not run in the last 10 hours [15:42:02] PROBLEM - Packetloss_Average on locke is CRITICAL: CRITICAL: packet_loss_average is 14.75302 (gt 8.0) [15:45:43] New patchset: Mark Bergsma; "Create a separate vcl_config hash, as retry5x/cache4xx are not backend options" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3091 [15:45:54] New review: gerrit2; "Change did not pass lint check. You will need to send an amended patchset for this (see: https://lab..." [operations/puppet] (production); V: -1 - https://gerrit.wikimedia.org/r/3091 [15:47:11] New patchset: Mark Bergsma; "Create a separate vcl_config hash, as retry5x/cache4xx are not backend options" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3091 [15:47:23] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3091 [15:48:37] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3091 [15:48:40] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3091 [15:49:05] PROBLEM - DPKG on snapshot3 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [15:51:11] PROBLEM - Disk space on srv221 is CRITICAL: DISK CRITICAL - free space: / 185 MB (2% inode=61%): /var/lib/ureadahead/debugfs 185 MB (2% inode=61%): [15:53:35] PROBLEM - Puppet freshness on amslvs1 is CRITICAL: Puppet has not run in the last 10 hours [15:53:35] PROBLEM - Puppet freshness on amssq31 is CRITICAL: Puppet has not run in the last 10 hours [15:57:02] New patchset: Mark Bergsma; "Require GET or HEAD for upload, no POST" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3092 [15:57:14] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3092 [15:57:36] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3092 [15:57:39] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3092 [15:59:35] PROBLEM - Puppet freshness on amssq35 is CRITICAL: Puppet has not run in the last 10 hours [15:59:35] PROBLEM - Puppet freshness on amslvs3 is CRITICAL: Puppet has not run in the last 10 hours [15:59:35] PROBLEM - Puppet freshness on amssq38 is CRITICAL: Puppet has not run in the last 10 hours [15:59:35] PROBLEM - Puppet freshness on amssq41 is CRITICAL: Puppet has not run in the last 10 hours [15:59:35] PROBLEM - Puppet freshness on amssq50 is CRITICAL: Puppet has not run in the last 10 hours [15:59:35] PROBLEM - Puppet freshness on amssq52 is CRITICAL: Puppet has not run in the last 10 hours [15:59:35] PROBLEM - Puppet freshness on knsq17 is CRITICAL: Puppet has not run in the last 10 hours [15:59:36] PROBLEM - Puppet freshness on amssq58 is CRITICAL: Puppet has not run in the last 10 hours [15:59:36] PROBLEM - Puppet freshness on knsq25 is CRITICAL: Puppet has not run in the last 10 hours [15:59:37] PROBLEM - Puppet freshness on cp3001 is CRITICAL: Puppet has not run in the last 10 hours [15:59:37] PROBLEM - Puppet freshness on knsq29 is CRITICAL: Puppet has not run in the last 10 hours [16:05:48] RECOVERY - Disk space on srv221 is OK: DISK OK [16:08:30] PROBLEM - Puppet freshness on amslvs4 is CRITICAL: Puppet has not run in the last 10 hours [16:08:30] PROBLEM - Puppet freshness on amssq33 is CRITICAL: Puppet has not run in the last 10 hours [16:08:30] PROBLEM - Puppet freshness on amssq36 is CRITICAL: Puppet has not run in the last 10 hours [16:08:30] PROBLEM - Puppet freshness on amssq44 is CRITICAL: Puppet has not run in the last 10 hours [16:08:30] PROBLEM - Puppet freshness on amssq39 is CRITICAL: Puppet has not run in the last 10 hours [16:08:30] PROBLEM - Puppet freshness on amssq51 is CRITICAL: Puppet has not run in the last 10 hours [16:08:30] PROBLEM - Puppet freshness on amssq55 is CRITICAL: Puppet has not run in the last 10 hours [16:08:31] PROBLEM - Puppet freshness on amssq54 is CRITICAL: Puppet has not run in the last 10 hours [16:08:31] PROBLEM - Puppet freshness on amssq53 is CRITICAL: Puppet has not run in the last 10 hours [16:08:32] PROBLEM - Puppet freshness on amssq60 is CRITICAL: Puppet has not run in the last 10 hours [16:08:32] PROBLEM - Puppet freshness on amssq59 is CRITICAL: Puppet has not run in the last 10 hours [16:08:33] PROBLEM - Puppet freshness on knsq26 is CRITICAL: Puppet has not run in the last 10 hours [16:08:33] PROBLEM - Puppet freshness on maerlant is CRITICAL: Puppet has not run in the last 10 hours [16:08:34] PROBLEM - Puppet freshness on ssl3004 is CRITICAL: Puppet has not run in the last 10 hours [16:09:24] PROBLEM - Puppet freshness on amssq32 is CRITICAL: Puppet has not run in the last 10 hours [16:09:24] PROBLEM - Puppet freshness on amssq47 is CRITICAL: Puppet has not run in the last 10 hours [16:09:24] PROBLEM - Puppet freshness on amssq48 is CRITICAL: Puppet has not run in the last 10 hours [16:09:24] PROBLEM - Puppet freshness on amssq45 is CRITICAL: Puppet has not run in the last 10 hours [16:09:24] PROBLEM - Puppet freshness on knsq27 is CRITICAL: Puppet has not run in the last 10 hours [16:09:24] PROBLEM - Puppet freshness on knsq28 is CRITICAL: Puppet has not run in the last 10 hours [16:12:24] PROBLEM - Puppet freshness on amssq42 is CRITICAL: Puppet has not run in the last 10 hours [16:12:24] PROBLEM - Puppet freshness on amssq34 is CRITICAL: Puppet has not run in the last 10 hours [16:12:24] PROBLEM - Puppet freshness on amssq37 is CRITICAL: Puppet has not run in the last 10 hours [16:12:24] PROBLEM - Puppet freshness on amssq57 is CRITICAL: Puppet has not run in the last 10 hours [16:12:24] PROBLEM - Puppet freshness on cp3002 is CRITICAL: Puppet has not run in the last 10 hours [16:12:24] PROBLEM - Puppet freshness on knsq19 is CRITICAL: Puppet has not run in the last 10 hours [16:12:25] PROBLEM - Puppet freshness on knsq16 is CRITICAL: Puppet has not run in the last 10 hours [16:12:25] PROBLEM - Puppet freshness on knsq18 is CRITICAL: Puppet has not run in the last 10 hours [16:12:26] PROBLEM - Puppet freshness on ssl3001 is CRITICAL: Puppet has not run in the last 10 hours [16:12:26] PROBLEM - Puppet freshness on ssl3002 is CRITICAL: Puppet has not run in the last 10 hours [16:13:27] PROBLEM - Puppet freshness on amssq46 is CRITICAL: Puppet has not run in the last 10 hours [16:14:30] PROBLEM - Puppet freshness on hooft is CRITICAL: Puppet has not run in the last 10 hours [16:14:30] PROBLEM - Puppet freshness on knsq22 is CRITICAL: Puppet has not run in the last 10 hours [16:15:24] PROBLEM - Puppet freshness on amssq43 is CRITICAL: Puppet has not run in the last 10 hours [16:15:24] PROBLEM - Puppet freshness on amssq61 is CRITICAL: Puppet has not run in the last 10 hours [16:15:24] PROBLEM - Puppet freshness on nescio is CRITICAL: Puppet has not run in the last 10 hours [16:17:30] PROBLEM - Puppet freshness on knsq20 is CRITICAL: Puppet has not run in the last 10 hours [16:18:24] PROBLEM - Packetloss_Average on locke is CRITICAL: CRITICAL: packet_loss_average is 10.3391611607 (gt 8.0) [16:45:15] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2989 [16:45:18] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2989 [16:48:15] PROBLEM - Host cp1021 is DOWN: PING CRITICAL - Packet loss = 100% [16:54:15] Related to our jokes at lunch on Friday: http://news.ycombinator.com/item?id=3693690 -- of course this makes the front page of HN. [16:54:46] New patchset: Lcarr; "Fixing fixme's from previous change https://gerrit.wikimedia.org/r/#change,2936" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3093 [16:54:58] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3093 [16:56:39] PROBLEM - Packetloss_Average on locke is CRITICAL: CRITICAL: packet_loss_average is 8.56872919643 (gt 8.0) [16:56:58] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 1; - https://gerrit.wikimedia.org/r/3093 [16:57:15] RECOVERY - Host cp1021 is UP: PING OK - Packet loss = 0%, RTA = 26.80 ms [17:00:33] PROBLEM - Host cp1022 is DOWN: PING CRITICAL - Packet loss = 100% [17:02:03] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3093 [17:02:06] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3093 [17:02:57] PROBLEM - Varnish HTTP upload-frontend on cp1021 is CRITICAL: Connection refused [17:03:15] PROBLEM - Varnish traffic logger on cp1021 is CRITICAL: PROCS CRITICAL: 0 processes with command name varnishncsa [17:06:42] RECOVERY - Host cp1022 is UP: PING OK - Packet loss = 0%, RTA = 26.49 ms [17:11:30] PROBLEM - Varnish HTTP upload-frontend on cp1022 is CRITICAL: Connection refused [17:11:52] !log nikerabbit synchronized php-1.19/extensions/WebFonts/ 'Updating WebFonts' [17:11:56] Logged the message, Master [17:12:06] PROBLEM - Varnish traffic logger on cp1022 is CRITICAL: PROCS CRITICAL: 0 processes with command name varnishncsa [17:14:28] !log nikerabbit synchronized php-1.19/extensions/Narayam/ 'Updating Narayam' [17:14:31] Logged the message, Master [17:14:57] RECOVERY - Varnish HTTP upload-frontend on cp1021 is OK: HTTP OK HTTP/1.1 200 OK - 643 bytes in 0.053 seconds [17:15:08] AaronSchulz: ping? [17:16:45] PROBLEM - Packetloss_Average on locke is CRITICAL: CRITICAL: packet_loss_average is 14.5513432432 (gt 8.0) [17:22:47] !log nikerabbit synchronized php-1.19/languages/ 'r113635' [17:22:50] Logged the message, Master [17:24:16] !log nikerabbit synchronized php-1.19/includes/Title.php 'r113635' [17:24:19] Logged the message, Master [17:26:14] New patchset: Mark Bergsma; "Setup dependency for varnish::logging" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3094 [17:26:26] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3094 [17:26:47] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3094 [17:26:50] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3094 [17:33:20] RECOVERY - Varnish HTTP upload-frontend on cp1022 is OK: HTTP OK HTTP/1.1 200 OK - 641 bytes in 0.054 seconds [17:33:47] RECOVERY - Varnish traffic logger on cp1022 is OK: PROCS OK: 2 processes with command name varnishncsa [17:42:56] PROBLEM - Packetloss_Average on locke is CRITICAL: CRITICAL: packet_loss_average is 8.77458491071 (gt 8.0) [17:44:14] running scap [17:45:12] RECOVERY - Varnish traffic logger on cp1021 is OK: PROCS OK: 2 processes with command name varnishncsa [17:48:56] !log nikerabbit synchronizing Wikimedia installation... : Deploying updated Translate [17:49:00] Logged the message, Master [17:54:17] Nikerabbit: Detected bug in an extension! Hook PageTranslationHooks::preventRestrictedTranslations failed to return a value; should return true to continue hook processing or false to abort. [17:54:22] Reedy: [17:54:24] New patchset: Mark Bergsma; "Throw cp1022-1028 into the upload eqiad pool" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3095 [17:54:25] crap [17:54:33] where? [17:54:36] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3095 [17:54:44] I added a comment on CR on mw.org [17:55:00] The code looks fine at trunk [17:55:41] Reedy: when did you see that? I'm just running scap [17:55:52] Nikerabbit: mw.org in CodeReview [17:55:56] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3095 [17:55:58] Didn't happen again, so guessing transient [17:55:59] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3095 [17:56:25] let's assume so [17:56:29] I also see other transient errors [17:57:06] I see missing files and such [17:58:00] Reedy: scap taking >15 minutes doesn't make me very confident that if something breaks I can revert back easily :/ [17:58:15] I tend to use sync-file/sync-dir [17:58:23] and then run scap later on to do message style updates [17:58:36] Reedy: that's what I did [17:58:43] except I didn't sync-dir Translate first [18:03:44] New patchset: Ryan Lane; "This seems silly, but let's try it." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3096 [18:03:56] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3096 [18:04:12] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3096 [18:04:15] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3096 [18:07:22] Reedy: scap is still running after 20 minutes, normal? [18:14:20] Reedy: 30 minutes, normal? [18:14:42] no new messages though [18:15:14] New patchset: Lcarr; "Declaring nagios monitor machines specifically" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3097 [18:15:22] New review: gerrit2; "Change did not pass lint check. You will need to send an amended patchset for this (see: https://lab..." [operations/puppet] (production); V: -1 - https://gerrit.wikimedia.org/r/3097 [18:16:06] New patchset: Lcarr; "Declaring nagios monitor machines specifically" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3097 [18:16:17] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3097 [18:18:16] New patchset: Ryan Lane; "You suck puppet. I hate you so much." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3098 [18:18:28] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3098 [18:18:35] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3098 [18:18:38] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3098 [18:19:42] !log Assuming scap has finished [18:19:46] Logged the message, Master [18:21:20] PROBLEM - Packetloss_Average on locke is CRITICAL: CRITICAL: packet_loss_average is 9.24173776786 (gt 8.0) [18:51:58] Nikerabbit: did it not return? :/ [18:52:03] Sometimes you might need to press enter a few times [18:58:36] Reedy: tried twice [19:04:57] PROBLEM - Host ms-be5 is DOWN: PING CRITICAL - Packet loss = 100% [19:22:12] PROBLEM - Packetloss_Average on locke is CRITICAL: CRITICAL: packet_loss_average is 14.1719694643 (gt 8.0) [19:36:18] PROBLEM - Packetloss_Average on locke is CRITICAL: CRITICAL: packet_loss_average is 9.18729848214 (gt 8.0) [19:43:21] RECOVERY - Host ms-be5 is UP: PING OK - Packet loss = 0%, RTA = 1.74 ms [20:32:10] PROBLEM - Puppet freshness on db1033 is CRITICAL: Puppet has not run in the last 10 hours [20:36:04] PROBLEM - Puppet freshness on virt4 is CRITICAL: Puppet has not run in the last 10 hours [20:40:07] PROBLEM - Puppet freshness on virt3 is CRITICAL: Puppet has not run in the last 10 hours [20:40:27] Reedy: ping pong :-] [20:40:42] Reedy: regarding bug 14407 and allowing central auth for all wikimedia.org domain [20:41:20] mmmm [20:42:55] Platonides says that upload.wikimedia.org is potential security issue :-] [20:43:16] we could fix that by hosting bits and images on another domain 8-)) [20:43:20] As per the bug, I'm not going to go and add them all to every site :p [20:43:33] We cannot have SUL on *.wikimedia.org [20:43:40] if that's what you're talking about [20:43:46] At least not with a star cookie [20:43:46] yeah that is [20:44:07] do you have any idea what is under wikimedia.org domain which is not hosted by us? [20:44:14] there's a few [20:44:29] yeah, I suggest we could have a per project workaround - ie if you logged into pt wiki, you could be logged into br.wm.o at the same time, for example [20:45:04] PROBLEM - Puppet freshness on virt1 is CRITICAL: Puppet has not run in the last 10 hours [20:45:25] that would probably make it [20:46:43] RECOVERY - Packetloss_Average on locke is OK: OK: packet_loss_average is 3.70933339286 [20:47:51] Also, bits.wm.o and upload.wm.o are cookieless on purpose [20:50:10] My suggestion was so we can fix the sites that want it, and not do it for everyone un-necesserily [20:55:07] PROBLEM - Puppet freshness on virt2 is CRITICAL: Puppet has not run in the last 10 hours [21:38:49] PROBLEM - Puppet freshness on mw53 is CRITICAL: Puppet has not run in the last 10 hours [21:43:28] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:44:30] !log Running foreachwiki extensions/WikimediaMaintenance/cleanupBug31576.php in screen as me on hume [21:44:33] Logged the message, Master [21:45:25] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 3.751 seconds [21:51:14] !log catrope synchronized php-1.19/extensions/ArticleFeedbackv5/ArticleFeedbackv5.hooks.php 'r113671' [21:51:17] Logged the message, Master [21:51:25] !log catrope synchronized php-1.19/extensions/ArticleFeedbackv5/modules/jquery.articleFeedbackv5/jquery.articleFeedbackv5.js 'r113671' [21:51:28] Logged the message, Master [22:11:58] PROBLEM - Disk space on srv219 is CRITICAL: DISK CRITICAL - free space: / 0 MB (0% inode=61%): /var/lib/ureadahead/debugfs 0 MB (0% inode=61%): [22:15:52] PROBLEM - Disk space on srv222 is CRITICAL: DISK CRITICAL - free space: / 192 MB (2% inode=61%): /var/lib/ureadahead/debugfs 192 MB (2% inode=61%): [22:15:52] PROBLEM - Disk space on srv224 is CRITICAL: DISK CRITICAL - free space: / 8 MB (0% inode=61%): /var/lib/ureadahead/debugfs 8 MB (0% inode=61%): [22:15:52] PROBLEM - Disk space on srv223 is CRITICAL: DISK CRITICAL - free space: / 187 MB (2% inode=61%): /var/lib/ureadahead/debugfs 187 MB (2% inode=61%): [22:15:52] PROBLEM - Disk space on srv220 is CRITICAL: DISK CRITICAL - free space: / 85 MB (1% inode=61%): /var/lib/ureadahead/debugfs 85 MB (1% inode=61%): [22:15:52] PROBLEM - Disk space on srv221 is CRITICAL: DISK CRITICAL - free space: / 78 MB (1% inode=61%): /var/lib/ureadahead/debugfs 78 MB (1% inode=61%): [22:19:55] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:23:58] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 6.751 seconds [22:25:55] RECOVERY - Disk space on srv222 is OK: DISK OK [22:26:04] RECOVERY - Disk space on srv219 is OK: DISK OK [22:29:58] PROBLEM - Disk space on srv224 is CRITICAL: DISK CRITICAL - free space: / 248 MB (3% inode=61%): /var/lib/ureadahead/debugfs 248 MB (3% inode=61%): [22:30:16] RECOVERY - Disk space on srv223 is OK: DISK OK [22:30:16] RECOVERY - Disk space on srv220 is OK: DISK OK [22:31:55] RECOVERY - Disk space on srv221 is OK: DISK OK [22:31:55] RECOVERY - Disk space on srv224 is OK: DISK OK [22:43:15] New patchset: Lcarr; "Adding in customized init file, preparing icinga for initial install Removing contact groups so that we don't get deluged by email in initial install" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3099 [22:43:27] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3099 [22:55:20] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3099 [22:55:23] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3099 [22:55:45] !log synchronized payments cluster to r113679, and tweaked the anti-fraud rules [22:55:49] Logged the message, Master [22:57:31] New patchset: Lcarr; "including nagios configuration group" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3100 [22:57:43] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3100 [22:58:19] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:02:31] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 9.453 seconds [23:11:07] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3097 [23:11:40] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3100 [23:11:43] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3100 [23:16:11] New patchset: Lcarr; "fixing network::checks" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3101 [23:16:23] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3101 [23:16:35] Change abandoned: Lcarr; "abandoning due to conflicts, redoing change" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3097 [23:17:15] New review: Lcarr; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3101 [23:17:17] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3101 [23:17:37] Change abandoned: Lcarr; "(no reason)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2956 [23:27:01] <\bMike\b> How long has WMF been using Ubuntu as the main server OS? [23:29:59] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 1; - https://gerrit.wikimedia.org/r/3036 [23:32:30] \bMike\b: since about 2007 [23:33:18] <\bMike\b> cool, thanks [23:34:16] TimStarling: I'll bite, what was it before Ubuntu? [23:34:29] fedora [23:34:44] and RH 9 before that [23:36:04] * chrismcmahon would have guessed SuSE, but RH makes sense [23:36:42] PROBLEM - Puppet freshness on mw1020 is CRITICAL: Puppet has not run in the last 10 hours [23:36:42] PROBLEM - Puppet freshness on mw1110 is CRITICAL: Puppet has not run in the last 10 hours [23:38:12] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:44:21] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 8.812 seconds