[00:00:50] (03CR) 10Ottomata: [C: 032] "Got a verbal +1 from cmjohnson1" [operations/puppet] - 10https://gerrit.wikimedia.org/r/83956 (owner: 10Ottomata) [00:00:54] (03PS2) 10Dzahn: RT #4671, RT #5681, remove quickipedia.org [operations/dns] - 10https://gerrit.wikimedia.org/r/81424 [00:01:29] (03CR) 10Dzahn: [C: 032] remove vikipedio.org and vikipedio.com, RT #4673, RT #4674, RT #5681 [operations/dns] - 10https://gerrit.wikimedia.org/r/81414 (owner: 10Dzahn) [00:01:49] (03CR) 10Dzahn: [C: 032] RT #4671, RT #5681, remove quickipedia.org [operations/dns] - 10https://gerrit.wikimedia.org/r/81424 (owner: 10Dzahn) [00:04:57] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:07:58] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 4.837 second response time [00:10:59] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:11:28] RECOVERY - check_job_queue on hume is OK: JOBQUEUE OK - all job queues below 10,000 [00:12:57] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 2.211 second response time [00:14:38] PROBLEM - check_job_queue on hume is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:15:57] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:16:57] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 2.160 second response time [01:08:22] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:09:11] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 3.978 second response time [01:57:33] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:58:23] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.130 second response time [02:09:34] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:10:33] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 5.833 second response time [02:16:32] !log LocalisationUpdate completed (1.22wmf16) at Thu Sep 12 02:16:32 UTC 2013 [02:16:37] Logged the message, Master [02:23:34] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:24:25] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.123 second response time [02:32:02] !log LocalisationUpdate completed (1.22wmf15) at Thu Sep 12 02:32:02 UTC 2013 [02:32:05] Logged the message, Master [02:43:09] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Wikipedias to 1.22wmf16 [02:43:12] Logged the message, Master [02:43:23] (03PS1) 10Reedy: Wikipedia to 1.22wmf16 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83975 [02:44:09] (03CR) 10Reedy: [C: 032] Wikipedia to 1.22wmf16 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83975 (owner: 10Reedy) [02:44:20] (03Merged) 10jenkins-bot: Wikipedia to 1.22wmf16 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83975 (owner: 10Reedy) [02:48:07] Reedy: ! [02:48:40] We were still fixing VisualEditor under the assumption wmf16 was going out tomorrow [02:49:13] lol [02:49:16] Wikitech says one thing [02:49:21] MediaWiki says another [02:50:47] RoanKattouw: You're fixing for wmf17, though. [02:51:07] Which was supposed to go out to "test" wikis this morning tooo [02:51:10] But isn't branched as of yet [02:51:54] !log LocalisationUpdate ResourceLoader cache refresh completed at Thu Sep 12 02:51:54 UTC 2013 [02:51:59] Logged the message, Master [02:52:40] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:52:49] James_F: Oooh OK [02:53:00] Reedy: Please don't branch it yet : [02:53:02] :) [02:54:29] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.124 second response time [02:55:00] I'm still somewhat confused [02:55:11] Sorry [02:55:13] Go ahead with wmf16 [02:55:17] But hold off on wmf17 [02:55:19] I already did [02:55:21] ;) [02:55:23] Also, l10nupdate just happened and bits is unhappy [02:55:37] * RoanKattouw doesn't see a !log for wmf16? [02:55:46] [03:43:09] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Wikipedias to 1.22wmf16 [02:55:57] Oh, pre-commit, right [03:02:40] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:03:07] (03PS2) 10TTO: Give testwiki some custom namespaces [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/78016 [03:03:29] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [03:22:40] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:23:30] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [03:26:21] RECOVERY - check_job_queue on fenari is OK: JOBQUEUE OK - all job queues below 10,000 [03:27:26] Well, isn't that useful [03:27:37] Extension:TemplateData is using gzdecode but our apaches don't have it [03:29:32] PROBLEM - check_job_queue on fenari is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:32:40] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:33:41] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.135 second response time [03:37:38] Aha [03:37:41] PHP 5.4 [03:49:18] !log reedy synchronized php-1.22wmf16/extensions/TemplateData/TemplateData.php [03:49:21] Logged the message, Master [03:50:17] Reedy: What did you do there? [03:50:32] A live hack to add a replacement for gzdecode [03:50:48] To stop http://commons.wikimedia.org/w/api.php?action=templatedata&titles=Template:Information&format=jsonfm fataling [03:50:54] https://bugzilla.wikimedia.org/show_bug.cgi?id=54058 [03:51:25] And going to commit it properly as it seems to work [03:51:28] Nice [03:54:43] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:55:32] Reedy: omg you're polluting the global namespace [03:55:42] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 3.019 second response time [03:55:44] lol [03:59:59] !log catrope synchronized php-1.22wmf16/extensions/VisualEditor/modules/ve-mw/init/ve.init.mw.Target.js 'touch' [04:00:02] Logged the message, Master [04:00:07] Anyone want to merge https://gerrit.wikimedia.org/r/#/c/83981 ? [04:02:24] Looking [04:17:53] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:18:30] !log reedy synchronized php-1.22wmf16/extensions/TemplateData [04:18:33] Logged the message, Master [04:18:52] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 3.822 second response time [04:19:32] (03PS1) 10Reedy: Move php to php-1.22wmf16 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83984 [04:20:16] (03CR) 10Reedy: [C: 032] Move php to php-1.22wmf16 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83984 (owner: 10Reedy) [04:20:28] (03Merged) 10jenkins-bot: Move php to php-1.22wmf16 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83984 (owner: 10Reedy) [05:22:42] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:23:32] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.127 second response time [05:27:40] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:28:31] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.132 second response time [05:47:31] PROBLEM - RAID on ms-be10 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:52:40] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:53:30] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.410 second response time [05:57:41] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:58:31] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 5.933 second response time [06:22:50] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:23:42] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [07:03:17] (03PS1) 10Jalexander: Add elwikivoyage logo [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83989 [07:03:25] (03CR) 10jenkins-bot: [V: 04-1] Add elwikivoyage logo [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83989 (owner: 10Jalexander) [07:06:14] (03PS2) 10Jalexander: Add elwikivoyage logo [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83989 [08:30:33] (03CR) 10TTO: "(1 comment)" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83989 (owner: 10Jalexander) [09:37:11] (03CR) 10Peachey88: "Is there a bugzilla report for this?" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83989 (owner: 10Jalexander) [09:44:46] (03PS1) 10Petr Onderka: Removed ChangeVisitor: it is not used anywhere [operations/dumps/incremental] (gsoc) - 10https://gerrit.wikimedia.org/r/83993 [09:44:47] (03PS1) 10Petr Onderka: Group compression of diff dumps [operations/dumps/incremental] (gsoc) - 10https://gerrit.wikimedia.org/r/83994 [09:49:03] (03PS2) 10Petr Onderka: Group compression of diff dumps [operations/dumps/incremental] (gsoc) - 10https://gerrit.wikimedia.org/r/83994 [09:51:29] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:53:17] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.123 second response time [09:57:18] PROBLEM - MySQL Processlist on db1051 is CRITICAL: CRIT 0 unauthenticated, 0 locked, 0 copy to table, 40 statistics [09:59:18] RECOVERY - MySQL Processlist on db1051 is OK: OK 0 unauthenticated, 0 locked, 0 copy to table, 14 statistics [10:11:14] (03PS3) 10Petr Onderka: Group compression of diff dumps [operations/dumps/incremental] (gsoc) - 10https://gerrit.wikimedia.org/r/83994 [10:18:12] (03CR) 10Petr Onderka: [C: 032 V: 032] Removed ChangeVisitor: it is not used anywhere [operations/dumps/incremental] (gsoc) - 10https://gerrit.wikimedia.org/r/83993 (owner: 10Petr Onderka) [10:18:41] (03CR) 10Petr Onderka: [C: 032 V: 032] Group compression of diff dumps [operations/dumps/incremental] (gsoc) - 10https://gerrit.wikimedia.org/r/83994 (owner: 10Petr Onderka) [10:30:19] PROBLEM - Host search23 is DOWN: PING CRITICAL - Packet loss = 100% [10:33:10] RECOVERY - Host search23 is UP: PING OK - Packet loss = 0%, RTA = 26.54 ms [10:35:13] PROBLEM - search indices - check lucene status page on search23 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:37:00] RECOVERY - search indices - check lucene status page on search23 is OK: HTTP OK: HTTP/1.1 200 OK - 269 bytes in 0.055 second response time [10:43:28] (03CR) 10Jalexander: "There is not a bugzilla, request was sent directly to LCA because they knew we had worked with the logo switch over." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83989 (owner: 10Jalexander) [10:44:55] (03PS3) 10Jalexander: Add elwikivoyage logo [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83989 [11:13:33] PROBLEM - MySQL Processlist on db1009 is CRITICAL: CRIT 0 unauthenticated, 0 locked, 1 copy to table, 113 statistics [11:14:30] RECOVERY - MySQL Processlist on db1009 is OK: OK 0 unauthenticated, 0 locked, 1 copy to table, 3 statistics [12:26:48] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:27:36] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.249 second response time [13:00:44] PROBLEM - MySQL Processlist on db1043 is CRITICAL: CRIT 0 unauthenticated, 0 locked, 0 copy to table, 50 statistics [13:01:45] RECOVERY - MySQL Processlist on db1043 is OK: OK 0 unauthenticated, 0 locked, 0 copy to table, 11 statistics [13:04:14] RECOVERY - check_job_queue on fenari is OK: JOBQUEUE OK - all job queues below 10,000 [13:07:29] PROBLEM - check_job_queue on fenari is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [14:11:36] PROBLEM - Apache HTTP on mw1060 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:12:36] RECOVERY - Apache HTTP on mw1060 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 7.294 second response time [14:24:15] PROBLEM - Apache HTTP on mw1088 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:25:16] RECOVERY - Apache HTTP on mw1088 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.131 second response time [14:28:38] (03PS1) 10Petr Onderka: Improved clearing of index caches [operations/dumps/incremental] (gsoc) - 10https://gerrit.wikimedia.org/r/83998 [14:32:56] (03CR) 10Petr Onderka: [C: 032 V: 032] Improved clearing of index caches [operations/dumps/incremental] (gsoc) - 10https://gerrit.wikimedia.org/r/83998 (owner: 10Petr Onderka) [14:55:45] RECOVERY - check_job_queue on fenari is OK: JOBQUEUE OK - all job queues below 10,000 [14:56:35] uh miracle [14:58:39] but probably a false one https://ganglia.wikimedia.org/latest/graph_all_periods.php?c=Miscellaneous%20pmtpa&h=hume.wikimedia.org&v=823574&m=Global_JobQueue_length&r=hour&z=default&jr=&js=&st=1365625056&z=large [14:58:44] PROBLEM - check_job_queue on fenari is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [15:31:32] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:32:23] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.124 second response time [15:52:34] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:53:35] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time [16:16:13] (03PS1) 10Petr Onderka: Don't write empty indexes [operations/dumps/incremental] (gsoc) - 10https://gerrit.wikimedia.org/r/84005 [16:20:02] (03CR) 10Umherirrender: "Yes, there are valid urls, so they should be handled all the same." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83225 (owner: 10Umherirrender) [16:27:24] (03PS2) 10Petr Onderka: Don't write empty indexes [operations/dumps/incremental] (gsoc) - 10https://gerrit.wikimedia.org/r/84005 [16:27:59] (03CR) 10Petr Onderka: [C: 032 V: 032] Don't write empty indexes [operations/dumps/incremental] (gsoc) - 10https://gerrit.wikimedia.org/r/84005 (owner: 10Petr Onderka) [16:51:30] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:52:20] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 4.857 second response time [16:57:30] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:59:19] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.127 second response time [17:14:30] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:15:33] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 8.181 second response time [17:16:26] !log reedy synchronized php-1.22wmf16/extensions/DataValues/ [17:16:29] Logged the message, Master [17:18:16] !log reedy synchronized php-1.22wmf16/extensions/Wikibase [17:18:19] Logged the message, Master [17:19:04] !log gallium : killed some long lasting java Jenkins thread [17:19:07] Logged the message, Master [17:19:23] Reedy: are you wiling to bring the site down while we are all attending a talk ? :-D [17:22:29] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:23:19] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [17:39:29] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:40:23] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.150 second response time [17:46:34] PROBLEM - MySQL Processlist on db1052 is CRITICAL: CRIT 0 unauthenticated, 0 locked, 0 copy to table, 87 statistics [17:47:59] (03PS1) 10BBlack: fix distcheck [operations/software/varnish/libvmod-netmapper] - 10https://gerrit.wikimedia.org/r/84012 [17:48:00] (03PS1) 10BBlack: Backport nlist.[ch] changes from gdnsd [operations/software/varnish/libvmod-netmapper] - 10https://gerrit.wikimedia.org/r/84013 [17:48:35] RECOVERY - MySQL Processlist on db1052 is OK: OK 0 unauthenticated, 0 locked, 0 copy to table, 2 statistics [17:52:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:53:24] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.140 second response time [17:55:50] !log reedy synchronized php-1.22wmf17 'Initial deploy of code files' [17:55:53] Logged the message, Master [17:56:25] * RobH is amused that Reedy is deploying [17:56:28] !log reedy synchronized docroot and w [17:56:31] Logged the message, Master [17:56:48] 'guys the sites down, all the engineers to their computers!' [17:56:57] (not really, ignore the above) [17:57:22] I should pull up ganglia I guess [17:57:24] no [17:57:33] I should not. someone over there in a normal tz should! [17:57:42] such a reflex... [17:59:53] apergos: dont steal the outage from those of us who are in meeting. [17:59:55] ;] [18:00:06] I won't, beelieve me [18:00:13] I have food to cook and eat, toys to play with [18:00:16] news to watch [18:02:25] (03Abandoned) 10Brion VIBBER: Add Wikipedia Zero IP ranges for Morocco [operations/puppet] - 10https://gerrit.wikimedia.org/r/48149 (owner: 10Brion VIBBER) [18:02:46] !log reedy synchronized php-1.22wmf16/extensions/Wikibase [18:05:36] (03PS1) 10Reedy: wmf17 symlinks [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/84018 [18:05:47] (03CR) 10Reedy: [C: 032] wmf17 symlinks [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/84018 (owner: 10Reedy) [18:07:03] (03CR) 10jenkins-bot: [V: 04-1] wmf17 symlinks [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/84018 (owner: 10Reedy) [18:07:07] (03Merged) 10jenkins-bot: wmf17 symlinks [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/84018 (owner: 10Reedy) [18:11:45] !log reedy Started syncing Wikimedia installation... : testwiki to 1.22wmf17 and build localisation cache [18:11:48] Logged the message, Master [18:28:42] !log reedy Finished syncing Wikimedia installation... : testwiki to 1.22wmf17 and build localisation cache [18:28:46] Logged the message, Master [18:29:30] (03CR) 10BBlack: [C: 032 V: 032] fix distcheck [operations/software/varnish/libvmod-netmapper] - 10https://gerrit.wikimedia.org/r/84012 (owner: 10BBlack) [18:29:41] (03CR) 10BBlack: [C: 032 V: 032] Backport nlist.[ch] changes from gdnsd [operations/software/varnish/libvmod-netmapper] - 10https://gerrit.wikimedia.org/r/84013 (owner: 10BBlack) [18:29:54] (03PS1) 10BBlack: 1.2 release stuff [operations/software/varnish/libvmod-netmapper] - 10https://gerrit.wikimedia.org/r/84019 [18:30:08] (03CR) 10BBlack: [C: 032 V: 032] 1.2 release stuff [operations/software/varnish/libvmod-netmapper] - 10https://gerrit.wikimedia.org/r/84019 (owner: 10BBlack) [18:30:26] (03PS1) 10BBlack: Merge branch 'master' into debian [operations/software/varnish/libvmod-netmapper] (debian) - 10https://gerrit.wikimedia.org/r/84020 [18:30:27] (03PS1) 10BBlack: updates for 1.2 (no multi-arch, new binary) [operations/software/varnish/libvmod-netmapper] (debian) - 10https://gerrit.wikimedia.org/r/84021 [18:30:40] (03CR) 10BBlack: [C: 032 V: 032] Merge branch 'master' into debian [operations/software/varnish/libvmod-netmapper] (debian) - 10https://gerrit.wikimedia.org/r/84020 (owner: 10BBlack) [18:30:51] (03CR) 10BBlack: [C: 032 V: 032] updates for 1.2 (no multi-arch, new binary) [operations/software/varnish/libvmod-netmapper] (debian) - 10https://gerrit.wikimedia.org/r/84021 (owner: 10BBlack) [18:31:12] (03PS1) 10Hashar: base debian dir [operations/debs/python-gear] - 10https://gerrit.wikimedia.org/r/84022 [18:31:16] yeahhh [18:33:30] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: loginwiki, mediawikiwiki, test2wiki and testwikidatawiki to 1.22wmf17 [18:33:34] Logged the message, Master [18:34:40] (03PS1) 10Reedy: loginwiki, mediawikiwiki, testwiki, test2wiki and testwikidatawiki to 1.22wmf17 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/84023 [18:35:29] (03CR) 10Reedy: [C: 032] loginwiki, mediawikiwiki, testwiki, test2wiki and testwikidatawiki to 1.22wmf17 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/84023 (owner: 10Reedy) [18:35:38] (03Merged) 10jenkins-bot: loginwiki, mediawikiwiki, testwiki, test2wiki and testwikidatawiki to 1.22wmf17 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/84023 (owner: 10Reedy) [18:36:10] RECOVERY - check_job_queue on fenari is OK: JOBQUEUE OK - all job queues below 10,000 [18:38:25] (03PS1) 10Yuvipanda: Add mosh to bastion hosts [operations/puppet] - 10https://gerrit.wikimedia.org/r/84024 [18:39:19] PROBLEM - check_job_queue on fenari is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [18:41:31] (03PS1) 10BBlack: deploy libvmod-netmapper to mobile varnishes [operations/puppet] - 10https://gerrit.wikimedia.org/r/84026 [18:52:52] (03PS1) 10RobH: adding bastion4001 to dhcp [operations/puppet] - 10https://gerrit.wikimedia.org/r/84028 [18:53:26] (03PS1) 10Lcarr: adding cr2-ulsfo [operations/dns] - 10https://gerrit.wikimedia.org/r/84029 [18:54:02] (03CR) 10Lcarr: [C: 032] adding cr2-ulsfo [operations/dns] - 10https://gerrit.wikimedia.org/r/84029 (owner: 10Lcarr) [18:54:03] (03CR) 10RobH: [C: 032] adding bastion4001 to dhcp [operations/puppet] - 10https://gerrit.wikimedia.org/r/84028 (owner: 10RobH) [18:55:07] (03PS3) 10RobH: adding ulsfo.wmnet to search domains on iron [operations/puppet] - 10https://gerrit.wikimedia.org/r/83950 [18:57:04] (03CR) 10RobH: [C: 032] adding ulsfo.wmnet to search domains on iron [operations/puppet] - 10https://gerrit.wikimedia.org/r/83950 (owner: 10RobH) [18:57:34] !log runnign ntpdate on lvs servers [18:57:37] Logged the message, Mistress of the network gear. [18:58:22] PROBLEM - Host wikiversity-lb.eqiad.wikimedia.org_ipv6 is DOWN: /bin/ping6 -n -U -w 15 -c 5 2620:0:861:ed1a::7 [18:58:38] PROBLEM - Host wikivoyage-lb.eqiad.wikimedia.org is DOWN: CRITICAL - Network Unreachable (208.80.154.243) [18:58:50] great that's me [18:58:53] restarting pybal [18:59:03] mmm, downtime! [18:59:06] (not really) [18:59:10] RECOVERY - Host wikivoyage-lb.eqiad.wikimedia.org is UP: PING OK - Packet loss = 0%, RTA = 0.68 ms [18:59:37] hey [18:59:38] paravoid: false alarm [18:59:40] what's up? [18:59:44] !log restarting pybal on lvs's [18:59:45] ntpd update on lvs server [18:59:47] Logged the message, Mistress of the network gear. [18:59:52] argh [18:59:53] that was me running ntpdate update and turns out pybal doesn't like that [18:59:56] paravoid: :D [18:59:56] sorry [19:00:28] RECOVERY - Host wikiversity-lb.eqiad.wikimedia.org_ipv6 is UP: PING OK - Packet loss = 0%, RTA = 0.23 ms [19:00:49] https://rt.wikimedia.org/Ticket/Display.html?id=4084 [19:00:55] rt ticket for lvs date issue [19:02:03] PROBLEM - Apache HTTP on mw1218 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:02:03] PROBLEM - Apache HTTP on mw1105 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:02:04] PROBLEM - Apache HTTP on mw1108 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:02:04] PROBLEM - Apache HTTP on mw1220 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:02:04] PROBLEM - Apache HTTP on mw1095 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:02:04] PROBLEM - Apache HTTP on mw1214 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:02:04] PROBLEM - Apache HTTP on mw1081 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:02:04] PROBLEM - Apache HTTP on mw1112 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:02:14] PROBLEM - Apache HTTP on mw1054 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:02:15] PROBLEM - Apache HTTP on mw1083 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:02:15] PROBLEM - Apache HTTP on mw1090 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:02:15] PROBLEM - Apache HTTP on mw1102 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:02:15] PROBLEM - Apache HTTP on mw1059 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:02:15] PROBLEM - Apache HTTP on mw1082 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:02:15] PROBLEM - Apache HTTP on mw1212 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:02:16] PROBLEM - Apache HTTP on mw1069 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:02:16] PROBLEM - Apache HTTP on mw1032 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:02:17] PROBLEM - Apache HTTP on mw1107 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:02:17] PROBLEM - Apache HTTP on mw1077 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:02:18] PROBLEM - Apache HTTP on mw1033 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:02:18] PROBLEM - Apache HTTP on mw1210 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:02:19] PROBLEM - Apache HTTP on mw1219 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:02:19] PROBLEM - Apache HTTP on mw1023 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:02:20] PROBLEM - Apache HTTP on mw1091 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:02:20] PROBLEM - Apache HTTP on mw1113 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:02:21] PROBLEM - Apache HTTP on mw1065 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:02:21] PROBLEM - Apache HTTP on mw1186 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:02:22] PROBLEM - Apache HTTP on mw1079 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:02:22] PROBLEM - Apache HTTP on mw1035 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:02:23] PROBLEM - Apache HTTP on mw1076 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:02:23] PROBLEM - Apache HTTP on mw1019 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:02:24] PROBLEM - Apache HTTP on mw1087 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:02:24] PROBLEM - Apache HTTP on mw1075 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:02:25] PROBLEM - Apache HTTP on mw1058 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:02:25] PROBLEM - Apache HTTP on mw1060 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:02:26] PROBLEM - Apache HTTP on mw1022 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:02:26] PROBLEM - Apache HTTP on mw1175 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:02:52] RECOVERY - Apache HTTP on mw1218 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.054 second response time [19:02:52] RECOVERY - Apache HTTP on mw1081 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.061 second response time [19:02:52] RECOVERY - Apache HTTP on mw1108 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.057 second response time [19:02:52] RECOVERY - Apache HTTP on mw1105 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.057 second response time [19:02:52] RECOVERY - Apache HTTP on mw1095 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.059 second response time [19:02:53] RECOVERY - Apache HTTP on mw1220 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.058 second response time [19:02:53] RECOVERY - Apache HTTP on mw1112 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.072 second response time [19:03:01] PROBLEM - Apache HTTP on mw1162 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:03:02] RECOVERY - Apache HTTP on mw1054 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.058 second response time [19:03:02] RECOVERY - Apache HTTP on mw1083 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.057 second response time [19:03:02] RECOVERY - Apache HTTP on mw1090 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.059 second response time [19:03:02] RECOVERY - Apache HTTP on mw1102 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.058 second response time [19:03:02] PROBLEM - Apache HTTP on mw1209 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:03:02] PROBLEM - Apache HTTP on mw1216 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:03:03] RECOVERY - Apache HTTP on mw1212 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.049 second response time [19:03:03] RECOVERY - Apache HTTP on mw1069 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.043 second response time [19:03:04] RECOVERY - Apache HTTP on mw1059 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.063 second response time [19:03:04] RECOVERY - Apache HTTP on mw1082 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.068 second response time [19:03:05] RECOVERY - Apache HTTP on mw1032 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.059 second response time [19:03:05] RECOVERY - Apache HTTP on mw1077 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.057 second response time [19:03:06] RECOVERY - Apache HTTP on mw1107 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.059 second response time [19:03:06] PROBLEM - Apache HTTP on mw1188 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:03:07] RECOVERY - Apache HTTP on mw1219 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.052 second response time [19:03:07] RECOVERY - Apache HTTP on mw1065 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.038 second response time [19:03:08] RECOVERY - Apache HTTP on mw1079 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.060 second response time [19:03:08] RECOVERY - Apache HTTP on mw1113 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.056 second response time [19:03:09] RECOVERY - Apache HTTP on mw1033 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.063 second response time [19:03:09] RECOVERY - Apache HTTP on mw1091 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.063 second response time [19:03:10] RECOVERY - Apache HTTP on mw1035 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.062 second response time [19:03:10] RECOVERY - Apache HTTP on mw1023 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.109 second response time [19:03:11] RECOVERY - Apache HTTP on mw1076 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.058 second response time [19:03:12] RECOVERY - Apache HTTP on mw1019 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.056 second response time [19:03:12] RECOVERY - Apache HTTP on mw1087 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.079 second response time [19:03:12] PROBLEM - LVS HTTP IPv4 on appservers.svc.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:03:26] PROBLEM - Apache HTTP on mw1101 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:03:26] RECOVERY - Apache HTTP on mw1022 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.044 second response time [19:03:26] RECOVERY - Apache HTTP on mw1075 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.063 second response time [19:03:26] RECOVERY - Apache HTTP on mw1058 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.065 second response time [19:03:26] RECOVERY - Apache HTTP on mw1060 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.068 second response time [19:03:26] RECOVERY - Apache HTTP on mw1175 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.050 second response time [19:03:27] RECOVERY - Apache HTTP on mw1186 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 9.278 second response time [19:03:28] (03PS1) 10Ottomata: role/analytics/kafka.pp - changing hostnames of labs kafka instances [operations/puppet] - 10https://gerrit.wikimedia.org/r/84030 [19:03:31] ok the apache things weren't me though [19:03:41] probably was indirectly [19:03:54] sorry about making your phone go off mark [19:03:56] RECOVERY - Apache HTTP on mw1209 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.062 second response time [19:03:56] RECOVERY - Apache HTTP on mw1214 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.052 second response time [19:03:56] RECOVERY - Apache HTTP on mw1162 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 8.677 second response time [19:03:56] RECOVERY - Apache HTTP on mw1216 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 7.218 second response time [19:03:56] RECOVERY - LVS HTTP IPv4 on appservers.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 62314 bytes in 0.237 second response time [19:04:06] meh, time has to get fixed at some piont [19:04:10] no one was woken up ;] [19:04:14] RECOVERY - Apache HTTP on mw1188 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 5.846 second response time [19:04:22] so, lvs boxes don't keep time [19:04:23] it's ok [19:04:25] RECOVERY - Apache HTTP on mw1101 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.059 second response time [19:04:26] leave it please :) [19:04:39] I'd like to see this fixed, but not like this :) [19:04:49] (03PS2) 10Ottomata: role/analytics/kafka.pp - changing hostnames of labs kafka instances [operations/puppet] - 10https://gerrit.wikimedia.org/r/84030 [19:04:58] PROBLEM - Apache HTTP on mw1183 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:05:03] well, someone who has authority update https://rt.wikimedia.org/Ticket/Display.html?id=4084 on how foks should proceed ;] [19:05:06] this may not have been the best time to do it ;) [19:05:33] hehe [19:05:35] PROBLEM - Apache HTTP on mw1071 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:05:35] PROBLEM - Apache HTTP on mw1037 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:05:35] RECOVERY - Apache HTTP on mw1210 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 9.501 second response time [19:05:35] PROBLEM - Apache HTTP on mw1098 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:05:35] PROBLEM - Apache HTTP on mw1063 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:05:35] PROBLEM - Apache HTTP on mw1161 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:05:38] (03CR) 10Ottomata: [C: 032 V: 032] role/analytics/kafka.pp - changing hostnames of labs kafka instances [operations/puppet] - 10https://gerrit.wikimedia.org/r/84030 (owner: 10Ottomata) [19:05:55] updated the ticket [19:06:04] PROBLEM - Apache HTTP on mw1111 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:06:05] RECOVERY - Apache HTTP on mw1183 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 9.207 second response time [19:06:12] still unhappy [19:06:23] so I've seen this happen before twice and I have no idea why this happens [19:06:27] RECOVERY - Apache HTTP on mw1071 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 2.312 second response time [19:06:27] RECOVERY - Apache HTTP on mw1063 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 5.771 second response time [19:06:31] it's like they're losing their balance [19:06:54] RECOVERY - Apache HTTP on mw1111 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 3.292 second response time [19:07:26] RECOVERY - Apache HTTP on mw1161 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.049 second response time [19:07:26] RECOVERY - Apache HTTP on mw1037 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.060 second response time [19:07:34] RECOVERY - Apache HTTP on mw1098 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 7.058 second response time [19:07:34] PROBLEM - Apache HTTP on mw1044 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:07:35] PROBLEM - Apache HTTP on mw1091 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:07:35] PROBLEM - Apache HTTP on mw1215 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:07:35] PROBLEM - Apache HTTP on mw1060 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:08:11] i think pybal is depooling servers, the rest get overloaded, they're depooled too [19:08:23] the ones depooled as the first batch get pooled again [19:08:25] RECOVERY - Apache HTTP on mw1091 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 1.235 second response time [19:08:26] RECOVERY - Apache HTTP on mw1044 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 3.748 second response time [19:08:27] and this keeps going for a while [19:08:33] last time around I increased the depool threshold [19:08:36] PROBLEM - Apache HTTP on mw1107 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:08:36] PROBLEM - Apache HTTP on mw1168 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:08:36] PROBLEM - Apache HTTP on mw1076 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:08:36] PROBLEM - Apache HTTP on mw1021 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:08:36] PROBLEM - Apache HTTP on mw1075 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:09:26] RECOVERY - Apache HTTP on mw1215 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 3.595 second response time [19:09:26] RECOVERY - Apache HTTP on mw1060 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.123 second response time [19:09:26] RECOVERY - Apache HTTP on mw1168 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 4.489 second response time [19:09:26] RECOVERY - Apache HTTP on mw1107 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 7.139 second response time [19:09:30] pybal is not happy [19:09:34] fetch failed by the dozens [19:09:35] PROBLEM - Apache HTTP on mw1082 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:09:35] PROBLEM - Apache HTTP on mw1092 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:09:35] PROBLEM - Apache HTTP on mw1069 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:09:35] PROBLEM - Apache HTTP on mw1059 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:09:35] PROBLEM - Apache HTTP on mw1068 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:09:35] PROBLEM - Apache HTTP on mw1034 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:09:35] PROBLEM - Apache HTTP on mw1087 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:09:36] PROBLEM - Apache HTTP on mw1062 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:09:36] PROBLEM - Apache HTTP on mw1178 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:09:37] PROBLEM - Apache HTTP on mw1211 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:09:37] PROBLEM - Apache HTTP on mw1063 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:09:38] PROBLEM - Apache HTTP on mw1065 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:09:38] RECOVERY - Apache HTTP on mw1075 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 6.566 second response time [19:09:39] PROBLEM - Apache HTTP on mw1058 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:09:50] well, are they getting overloaded? [19:10:15] PROBLEM - Apache HTTP on mw1090 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:10:15] PROBLEM - Apache HTTP on mw1096 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:10:19] nope. hm. [19:10:25] RECOVERY - Apache HTTP on mw1069 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.058 second response time [19:10:26] RECOVERY - Apache HTTP on mw1076 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.063 second response time [19:10:26] RECOVERY - Apache HTTP on mw1087 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.066 second response time [19:10:26] RECOVERY - Apache HTTP on mw1034 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.063 second response time [19:10:26] RECOVERY - Apache HTTP on mw1065 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.062 second response time [19:10:26] RECOVERY - Apache HTTP on mw1082 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 3.878 second response time [19:10:26] RECOVERY - Apache HTTP on mw1178 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 2.603 second response time [19:10:27] RECOVERY - Apache HTTP on mw1058 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.049 second response time [19:10:27] i think they are, momentarily [19:10:34] RECOVERY - Apache HTTP on mw1211 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 5.025 second response time [19:10:35] RECOVERY - Apache HTTP on mw1092 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 7.854 second response time [19:10:35] RECOVERY - Apache HTTP on mw1063 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 5.463 second response time [19:10:35] RECOVERY - Apache HTTP on mw1062 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 7.414 second response time [19:10:35] PROBLEM - Apache HTTP on mw1098 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:10:35] PROBLEM - Apache HTTP on mw1018 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:10:37] so it doesn't appear in graphs [19:10:40] well, we could restart them [19:10:56] want to switch depool-threshold for now and let puppet overwrite it back to norma l? [19:10:57] no this would actually make things worse I think [19:11:05] RECOVERY - Apache HTTP on mw1096 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.055 second response time [19:11:06] RECOVERY - Apache HTTP on mw1090 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 1.397 second response time [19:11:26] RECOVERY - Apache HTTP on mw1059 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.058 second response time [19:11:26] RECOVERY - Apache HTTP on mw1068 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 4.589 second response time [19:11:26] RECOVERY - Apache HTTP on mw1098 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 4.311 second response time [19:11:34] RECOVERY - Apache HTTP on mw1021 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 8.469 second response time [19:11:34] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:11:53] mark: are you debugging this? [19:12:03] I see mw* being 100% CPU on all CPUs [19:12:04] PROBLEM - Apache HTTP on mw1080 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:12:05] PROBLEM - Apache HTTP on mw1025 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:12:09] or maybe 80% [19:12:12] i'm looking at it yes [19:12:15] PROBLEM - Apache HTTP on mw1176 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:12:34] RECOVERY - Apache HTTP on mw1018 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 3.017 second response time [19:12:35] ok [19:12:55] RECOVERY - Apache HTTP on mw1080 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.065 second response time [19:13:06] PROBLEM - Apache HTTP on mw1020 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:13:06] PROBLEM - Apache HTTP on mw1179 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:13:06] PROBLEM - Apache HTTP on mw1049 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:13:06] PROBLEM - Apache HTTP on mw1166 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:13:14] RECOVERY - Apache HTTP on mw1176 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.054 second response time [19:13:15] PROBLEM - Apache HTTP on mw1042 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:13:26] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 3.351 second response time [19:13:56] RECOVERY - Apache HTTP on mw1049 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.203 second response time [19:13:56] RECOVERY - Apache HTTP on mw1025 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.606 second response time [19:14:05] RECOVERY - Apache HTTP on mw1179 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 4.463 second response time [19:14:05] PROBLEM - Apache HTTP on mw1053 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:14:05] PROBLEM - Apache HTTP on mw1031 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:14:10] (03PS2) 10Hashar: base debian dir [operations/debs/python-gear] - 10https://gerrit.wikimedia.org/r/84022 [19:14:55] RECOVERY - Apache HTTP on mw1053 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.057 second response time [19:14:55] RECOVERY - Apache HTTP on mw1031 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.061 second response time [19:14:55] RECOVERY - Apache HTTP on mw1020 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.058 second response time [19:15:05] RECOVERY - Apache HTTP on mw1042 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.392 second response time [19:15:05] PROBLEM - Apache HTTP on mw1209 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:15:05] PROBLEM - Apache HTTP on mw1112 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:15:05] PROBLEM - Apache HTTP on mw1167 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:15:10] I think if we make pybal stop depooling servers it'll fix itself very quickly [19:15:24] just did that [19:15:35] PROBLEM - Apache HTTP on mw1059 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:15:35] PROBLEM - Apache HTTP on mw1092 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:15:35] PROBLEM - Apache HTTP on mw1100 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:15:35] PROBLEM - Apache HTTP on mw1019 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:15:35] PROBLEM - Apache HTTP on mw1186 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:15:35] PROBLEM - Apache HTTP on mw1168 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:15:35] PROBLEM - Apache HTTP on mw1021 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:15:36] PROBLEM - Apache HTTP on mw1077 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:15:36] PROBLEM - Apache HTTP on mw1051 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:15:37] PROBLEM - Apache HTTP on mw1210 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:15:37] PROBLEM - Apache HTTP on mw1113 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:15:38] PROBLEM - Apache HTTP on mw1063 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:15:38] PROBLEM - Apache HTTP on mw1219 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:15:39] PROBLEM - Apache HTTP on mw1058 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:15:39] PROBLEM - Apache HTTP on mw1018 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:15:40] PROBLEM - Apache HTTP on mw1066 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:15:40] what did you do? [19:15:45] PROBLEM - Apache HTTP on mw1086 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:15:55] PROBLEM - Apache HTTP on mw1111 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:15:55] PROBLEM - Apache HTTP on mw1185 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:16:05] PROBLEM - Apache HTTP on mw1162 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:16:05] PROBLEM - Apache HTTP on mw1183 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:16:05] PROBLEM - Apache HTTP on mw1177 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:16:05] PROBLEM - Apache HTTP on mw1216 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:16:05] PROBLEM - Apache HTTP on mw1181 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:16:05] PROBLEM - Apache HTTP on mw1214 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:16:05] PROBLEM - Apache HTTP on mw1105 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:16:06] PROBLEM - Apache HTTP on mw1170 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:16:06] PROBLEM - Apache HTTP on mw1108 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:16:07] PROBLEM - Apache HTTP on mw1106 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:16:07] PROBLEM - Apache HTTP on mw1220 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:16:08] PROBLEM - Apache HTTP on mw1218 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:16:08] PROBLEM - Apache HTTP on mw1188 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:16:15] PROBLEM - Apache HTTP on mw1174 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:16:15] PROBLEM - Apache HTTP on mw1176 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:16:15] PROBLEM - Apache HTTP on mw1180 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:16:25] RECOVERY - Apache HTTP on mw1021 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.058 second response time [19:16:25] RECOVERY - Apache HTTP on mw1018 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.065 second response time [19:16:25] RECOVERY - Apache HTTP on mw1019 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 3.977 second response time [19:16:35] PROBLEM - Apache HTTP on mw1082 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:16:36] PROBLEM - Apache HTTP on mw1101 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:16:36] PROBLEM - Apache HTTP on mw1069 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:16:36] PROBLEM - Apache HTTP on mw1068 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:16:36] PROBLEM - Apache HTTP on mw1071 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:16:42] leslie was just tryign to ensure you guys had an alternative to the talks. [19:16:57] please just stay home next time or something? [19:17:56] PROBLEM - Apache HTTP on mw1050 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:18:05] PROBLEM - Apache HTTP on mw1104 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:18:06] PROBLEM - Apache HTTP on mw1094 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:18:06] PROBLEM - LVS HTTP IPv4 on wikidata-lb.esams.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:18:06] PROBLEM - LVS HTTP IPv6 on wikidata-lb.esams.wikimedia.org_ipv6 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:18:06] PROBLEM - Apache HTTP on mw1084 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:18:06] PROBLEM - Apache HTTP on mw1089 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:18:06] PROBLEM - Apache HTTP on mw1053 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:18:07] PROBLEM - Apache HTTP on mw1093 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:18:07] PROBLEM - LVS HTTP IPv6 on wikidata-lb.pmtpa.wikimedia.org_ipv6 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:18:22] fuck [19:18:29] PROBLEM - Apache HTTP on mw1081 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:18:30] PROBLEM - Apache HTTP on mw1031 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:18:30] PROBLEM - Apache HTTP on mw1040 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:18:30] PROBLEM - Apache HTTP on mw1027 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:18:30] PROBLEM - Apache HTTP on mw1038 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:18:30] PROBLEM - Apache HTTP on mw1052 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:18:30] PROBLEM - Apache HTTP on mw1043 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:18:31] PROBLEM - Apache HTTP on mw1055 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:18:31] PROBLEM - Apache HTTP on mw1020 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:18:32] PROBLEM - Apache HTTP on mw1057 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:18:32] PROBLEM - Apache HTTP on mw1103 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:18:33] PROBLEM - Apache HTTP on mw1096 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:18:33] PROBLEM - Apache HTTP on mw1056 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:18:34] PROBLEM - Apache HTTP on mw1090 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:18:34] PROBLEM - Apache HTTP on mw1083 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:18:35] PROBLEM - Apache HTTP on mw1026 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:18:35] PROBLEM - LVS HTTP IPv4 on appservers.svc.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:18:44] PROBLEM - Apache HTTP on mw1042 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:18:44] PROBLEM - Apache HTTP on mw1041 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:18:44] PROBLEM - Apache HTTP on mw1102 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:18:44] PROBLEM - LVS HTTP IPv4 on wikidata-lb.pmtpa.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:20:06] RECOVERY - LVS HTTPS IPv4 on wikinews-lb.esams.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 80099 bytes in 2.284 second response time [19:20:06] RECOVERY - LVS HTTP IPv4 on wikinews-lb.esams.wikimedia.org is OK: HTTP OK: HTTP/1.0 200 OK - 79568 bytes in 0.467 second response time [19:20:15] RECOVERY - Apache HTTP on mw1187 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 5.139 second response time [19:20:15] RECOVERY - Apache HTTP on mw1184 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 5.871 second response time [19:20:15] RECOVERY - Apache HTTP on mw1219 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 6.238 second response time [19:20:15] RECOVERY - Apache HTTP on mw1178 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 6.305 second response time [19:20:15] RECOVERY - Apache HTTP on mw1164 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 6.909 second response time [19:20:15] RECOVERY - Apache HTTP on mw1210 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 7.293 second response time [19:20:15] RECOVERY - Apache HTTP on mw1172 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 7.328 second response time [19:20:16] RECOVERY - Apache HTTP on mw1212 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 7.851 second response time [19:20:16] RECOVERY - LVS HTTPS IPv4 on wikivoyage-lb.pmtpa.wikimedia.org is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 757 bytes in 9.359 second response time [19:20:17] RECOVERY - Apache HTTP on mw1074 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 9.354 second response time [19:20:17] RECOVERY - Apache HTTP on mw1217 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 6.158 second response time [19:20:18] RECOVERY - Apache HTTP on mw1168 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 9.416 second response time [19:20:18] RECOVERY - Apache HTTP on mw1161 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 9.576 second response time [19:20:18] it's recovering [19:20:19] RECOVERY - Apache HTTP on mw1171 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 9.737 second response time [19:20:19] RECOVERY - Apache HTTP on mw1037 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 9.903 second response time [19:20:20] RECOVERY - Apache HTTP on mw1097 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 4.484 second response time [19:20:20] PROBLEM - Apache HTTP on mw1021 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:20:21] PROBLEM - Apache HTTP on mw1023 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:20:21] PROBLEM - Frontend Squid HTTP on amssq42 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:20:22] PROBLEM - Apache HTTP on mw1030 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:20:22] RECOVERY - Apache HTTP on mw1185 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 4.632 second response time [19:20:23] RECOVERY - Apache HTTP on mw1177 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 2.826 second response time [19:20:23] RECOVERY - Apache HTTP on mw1163 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 7.673 second response time [19:20:24] RECOVERY - Apache HTTP on mw1216 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 5.712 second response time [19:20:24] RECOVERY - LVS HTTP IPv4 on wikidata-lb.pmtpa.wikimedia.org is OK: HTTP OK: HTTP/1.0 301 Moved Permanently - 586 bytes in 0.137 second response time [19:20:26] \o/ [19:20:28] RECOVERY - Apache HTTP on mw1067 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 8.227 second response time [19:20:29] RECOVERY - Apache HTTP on mw1058 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 8.595 second response time [19:20:29] RECOVERY - Apache HTTP on mw1218 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 2.901 second response time [19:20:29] RECOVERY - Apache HTTP on mw1174 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 1.671 second response time [19:20:29] PROBLEM - Apache HTTP on mw1024 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:20:32] maybe [19:20:32] rise servers, riiiiisssseeeee [19:20:45] there was a big lvs spike [19:21:01] sean says innodb transaction spike too [19:21:13] come on icinga-wm [19:21:22] http://ganglia.wikimedia.org/latest/graph.php?r=hour&z=xlarge&c=LVS+loadbalancers+eqiad&m=cpu_report&s=by+name&mc=2&g=network_report [19:21:30] NTPD, the greatest thread to the cluster. [19:21:33] squid retries I think [19:21:35] RECOVERY - Apache HTTP on mw1167 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.051 second response time [19:21:35] RECOVERY - Apache HTTP on mw1162 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.068 second response time [19:21:35] RECOVERY - Apache HTTP on mw1105 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.089 second response time [19:21:35] RECOVERY - Apache HTTP on mw1053 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.058 second response time [19:21:35] RECOVERY - Apache HTTP on mw1083 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 4.442 second response time [19:21:35] threat even [19:21:36] RECOVERY - Apache HTTP on mw1040 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.059 second response time [19:21:36] RECOVERY - Apache HTTP on mw1112 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.063 second response time [19:21:37] RECOVERY - Apache HTTP on mw1080 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.068 second response time [19:21:37] RECOVERY - Apache HTTP on mw1064 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.555 second response time [19:21:38] RECOVERY - Apache HTTP on mw1089 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.060 second response time [19:21:38] RECOVERY - Apache HTTP on mw1056 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 6.868 second response time [19:21:40] damn it my sarcastic comment is ruined. [19:21:43] RECOVERY - Apache HTTP on mw1095 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.043 second response time [19:21:43] RECOVERY - Apache HTTP on mw1108 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.058 second response time [19:21:43] RECOVERY - Apache HTTP on mw1061 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.055 second response time [19:21:43] RECOVERY - Apache HTTP on mw1027 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.071 second response time [19:21:53] RECOVERY - Apache HTTP on mw1020 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 1.438 second response time [19:22:14] dontcha hate it when your great punchline is ruined by a typo [19:22:23] RECOVERY - Apache HTTP on mw1033 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.085 second response time [19:22:45] yep. [19:23:14] seems to have recovered I think [19:23:22] naw, i think i'm the greatest threat to the cluster [19:23:31] mark: what did you do? just change depool-threshold to 1? [19:23:34] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:23:38] to .9 [19:23:46] ok [19:24:09] so this is a problem, it's the third time this has happened the past month [19:24:23] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.475 second response time [19:24:51] motd message? Uninstall ntp on the LVS boxes? [19:25:05] ntp didn't cause the other 2 ;) [19:25:30] the third time that api has had machines depool and then it floods all the remaining machines, causing them to depool, loops forever [19:26:05] api? [19:26:23] (03PS1) 10Mark Bergsma: Increase depool threshold of apaches to .9 [operations/puppet] - 10https://gerrit.wikimedia.org/r/84033 [19:27:02] mark: that's a good idea nevertheless, but it is a problem in general, isn't it? [19:27:20] this means we can't handle, say, a 20% increase in (uncached) traffic [19:27:22] (03CR) 10Mark Bergsma: [C: 032] Increase depool threshold of apaches to .9 [operations/puppet] - 10https://gerrit.wikimedia.org/r/84033 (owner: 10Mark Bergsma) [19:27:24] (03PS3) 10Hashar: initial debian directory [operations/debs/python-gear] - 10https://gerrit.wikimedia.org/r/84022 [19:27:54] the api vip has been the one that keeps seeing the issues, hasn't it ? [19:28:02] the one that has the most obvious symptoms [19:28:14] no [19:28:18] appservers [19:29:19] oh [19:29:21] that's right [19:29:34] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:31:10] (03PS1) 10Ottomata: Matching kafka.default.erb with updated /etc/default/kafka in latest 0.8 deb. [operations/puppet/kafka] - 10https://gerrit.wikimedia.org/r/84040 [19:32:01] (03PS2) 10Ottomata: Matching kafka.default.erb with updated /etc/default/kafka in latest 0.8 deb. [operations/puppet/kafka] - 10https://gerrit.wikimedia.org/r/84040 [19:32:18] (03CR) 10Ottomata: [C: 032 V: 032] Matching kafka.default.erb with updated /etc/default/kafka in latest 0.8 deb. [operations/puppet/kafka] - 10https://gerrit.wikimedia.org/r/84040 (owner: 10Ottomata) [19:34:24] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 3.756 second response time [19:35:08] !log Rebalanced eqiad apaches in PyBal weight, according to machine specs [19:35:11] Logged the message, Master [19:37:43] PROBLEM - Apache HTTP on mw1209 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:38:33] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:39:33] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 4.399 second response time [19:39:33] PROBLEM - Apache HTTP on mw1161 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:39:43] RECOVERY - Apache HTTP on mw1209 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 7.914 second response time [19:39:53] PROBLEM - Apache HTTP on mw1188 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:40:33] PROBLEM - Apache HTTP on mw1215 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:40:33] PROBLEM - Apache HTTP on mw1186 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:41:33] PROBLEM - Apache HTTP on mw1211 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:41:53] RECOVERY - Apache HTTP on mw1188 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 8.488 second response time [19:41:53] PROBLEM - Apache HTTP on mw1220 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:42:33] RECOVERY - Apache HTTP on mw1186 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 9.078 second response time [19:42:43] PROBLEM - Apache HTTP on mw1166 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:43:43] RECOVERY - Apache HTTP on mw1166 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 9.715 second response time [19:43:43] PROBLEM - Apache HTTP on mw1209 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:44:33] PROBLEM - Apache HTTP on mw1212 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:44:33] PROBLEM - Apache HTTP on mw1169 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:44:34] PROBLEM - Apache HTTP on mw1178 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:44:43] PROBLEM - Apache HTTP on mw1176 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:44:43] RECOVERY - Apache HTTP on mw1209 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 7.494 second response time [19:44:53] PROBLEM - Apache HTTP on mw1182 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:45:24] RECOVERY - Apache HTTP on mw1212 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 3.913 second response time [19:45:34] RECOVERY - Apache HTTP on mw1215 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 7.756 second response time [19:45:34] RECOVERY - Apache HTTP on mw1161 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 9.717 second response time [19:45:34] PROBLEM - Apache HTTP on mw1219 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:45:34] RECOVERY - Apache HTTP on mw1169 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 9.666 second response time [19:45:34] RECOVERY - Apache HTTP on mw1211 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 9.795 second response time [19:45:53] RECOVERY - Apache HTTP on mw1220 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 6.985 second response time [19:46:35] RECOVERY - Apache HTTP on mw1178 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 8.977 second response time [19:46:36] PROBLEM - Apache HTTP on mw1168 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:47:09] (03PS4) 10Hashar: initial debian directory [operations/debs/python-gear] - 10https://gerrit.wikimedia.org/r/84022 [19:47:33] RECOVERY - Apache HTTP on mw1168 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 8.424 second response time [19:47:33] PROBLEM - Apache HTTP on mw1163 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:48:23] RECOVERY - Apache HTTP on mw1163 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.778 second response time [19:48:24] RECOVERY - Apache HTTP on mw1219 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 4.133 second response time [19:49:33] PROBLEM - Apache HTTP on mw1216 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:49:43] PROBLEM - Apache HTTP on mw1166 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:50:24] RECOVERY - Apache HTTP on mw1216 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.078 second response time [19:50:33] RECOVERY - Apache HTTP on mw1176 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.071 second response time [19:50:33] RECOVERY - Apache HTTP on mw1166 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.064 second response time [19:50:33] PROBLEM - LVS HTTP IPv4 on appservers.svc.eqiad.wmnet is CRITICAL: Connection refused [19:51:04] PROBLEM - Apache HTTP on mw1042 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:51:04] PROBLEM - Apache HTTP on mw1050 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:51:04] PROBLEM - Apache HTTP on mw1056 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:51:04] PROBLEM - Apache HTTP on mw1170 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:51:04] PROBLEM - Apache HTTP on mw1183 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:51:04] PROBLEM - Apache HTTP on mw1188 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:51:05] PROBLEM - Apache HTTP on mw1084 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:51:05] PROBLEM - Apache HTTP on mw1111 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:51:06] PROBLEM - Apache HTTP on mw1061 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:51:06] PROBLEM - Apache HTTP on mw1048 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:51:07] PROBLEM - Apache HTTP on mw1106 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:51:15] crap [19:51:23] :( [19:51:26] mark ? [19:51:34] PROBLEM - LVS HTTP IPv4 on wikinews-lb.esams.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:51:35] PROBLEM - LVS HTTPS IPv4 on wikinews-lb.esams.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:51:37] he's on it [19:51:39] it's coming back [19:51:41] it seems [19:51:41] as am I [19:51:44] RECOVERY - LVS HTTP IPv4 on appservers.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 62238 bytes in 5.505 second response time [19:52:10] RECOVERY - Apache HTTP on mw1050 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 5.815 second response time [19:52:10] RECOVERY - Apache HTTP on mw1061 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 5.916 second response time [19:52:19] RECOVERY - Apache HTTP on mw1170 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 7.559 second response time [19:52:19] RECOVERY - Apache HTTP on mw1111 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 9.088 second response time [19:52:19] RECOVERY - Apache HTTP on mw1084 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 9.361 second response time [19:52:19] PROBLEM - Apache HTTP on mw1027 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:52:29] RECOVERY - LVS HTTP IPv4 on wikinews-lb.esams.wikimedia.org is OK: HTTP OK: HTTP/1.0 200 OK - 79568 bytes in 0.466 second response time [19:52:29] RECOVERY - LVS HTTPS IPv4 on wikinews-lb.esams.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 80099 bytes in 0.841 second response time [19:52:29] PROBLEM - Apache HTTP on mw1161 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:52:33] :) [19:52:39] PROBLEM - LVS HTTPS IPv4 on wikivoyage-lb.pmtpa.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:52:39] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:53:10] RECOVERY - Apache HTTP on mw1182 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.052 second response time [19:53:10] RECOVERY - Apache HTTP on mw1188 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.115 second response time [19:53:10] RECOVERY - Apache HTTP on mw1183 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.375 second response time [19:53:19] PROBLEM - Apache HTTP on mw1038 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:53:20] PROBLEM - Apache HTTP on mw1103 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:53:20] PROBLEM - Apache HTTP on mw1108 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:53:20] PROBLEM - Apache HTTP on mw1025 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:53:20] PROBLEM - Apache HTTP on mw1036 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:53:20] RECOVERY - Apache HTTP on mw1161 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.065 second response time [19:53:29] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.131 second response time [19:53:29] RECOVERY - LVS HTTPS IPv4 on wikivoyage-lb.pmtpa.wikimedia.org is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 757 bytes in 0.197 second response time [19:53:29] PROBLEM - Apache HTTP on mw1019 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:53:30] PROBLEM - Apache HTTP on mw1044 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:53:30] PROBLEM - Apache HTTP on mw1088 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:53:30] PROBLEM - Apache HTTP on mw1060 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:53:30] PROBLEM - Apache HTTP on mw1071 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:53:31] PROBLEM - Apache HTTP on mw1062 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:53:39] PROBLEM - Apache HTTP on mw1033 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:53:39] PROBLEM - Apache HTTP on mw1047 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:53:39] PROBLEM - Apache HTTP on mw1180 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:53:39] PROBLEM - Apache HTTP on mw1214 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:53:39] PROBLEM - Apache HTTP on mw1218 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:53:40] PROBLEM - Apache HTTP on mw1166 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:53:40] PROBLEM - Apache HTTP on mw1174 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:53:41] PROBLEM - Apache HTTP on mw1057 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:53:49] PROBLEM - Apache HTTP on mw1052 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:53:49] PROBLEM - Apache HTTP on mw1177 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:53:49] PROBLEM - Apache HTTP on mw1167 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:53:49] PROBLEM - Apache HTTP on mw1090 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:54:10] RECOVERY - Apache HTTP on mw1025 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.056 second response time [19:54:10] RECOVERY - Apache HTTP on mw1027 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.059 second response time [19:54:10] RECOVERY - Apache HTTP on mw1103 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 4.009 second response time [19:54:19] RECOVERY - Apache HTTP on mw1056 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 8.530 second response time [19:54:19] RECOVERY - Apache HTTP on mw1048 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 8.816 second response time [19:54:19] RECOVERY - Apache HTTP on mw1036 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 9.255 second response time [19:54:29] RECOVERY - Apache HTTP on mw1060 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 3.112 second response time [19:54:29] RECOVERY - Apache HTTP on mw1071 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 3.687 second response time [19:54:30] RECOVERY - Apache HTTP on mw1062 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 4.025 second response time [19:54:30] RECOVERY - Apache HTTP on mw1047 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 2.450 second response time [19:54:30] RECOVERY - Apache HTTP on mw1033 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 2.696 second response time [19:54:30] RECOVERY - Apache HTTP on mw1044 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 7.390 second response time [19:54:39] RECOVERY - Apache HTTP on mw1177 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 1.538 second response time [19:54:39] RECOVERY - Apache HTTP on mw1180 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 8.542 second response time [19:54:40] RECOVERY - Apache HTTP on mw1214 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 9.089 second response time [19:54:40] RECOVERY - Apache HTTP on mw1167 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 4.758 second response time [19:54:40] RECOVERY - Apache HTTP on mw1174 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 9.802 second response time [19:55:19] RECOVERY - Apache HTTP on mw1108 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 9.649 second response time [19:55:19] PROBLEM - Apache HTTP on mw1084 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:55:29] RECOVERY - Apache HTTP on mw1218 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.049 second response time [19:55:29] PROBLEM - Apache HTTP on mw1030 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:55:30] PROBLEM - Apache HTTP on mw1023 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:55:30] PROBLEM - Apache HTTP on mw1065 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:55:39] PROBLEM - Apache HTTP on mw1109 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:55:39] PROBLEM - Apache HTTP on mw1067 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:55:39] PROBLEM - Apache HTTP on mw1076 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:55:39] PROBLEM - Apache HTTP on mw1021 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:55:39] PROBLEM - Apache HTTP on mw1032 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:55:40] PROBLEM - Apache HTTP on mw1024 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:55:40] PROBLEM - Apache HTTP on mw1028 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:55:41] RECOVERY - Apache HTTP on mw1166 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 6.277 second response time [19:55:41] PROBLEM - Apache HTTP on mw1031 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:55:42] PROBLEM - Apache HTTP on mw1083 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:55:49] PROBLEM - Apache HTTP on mw1026 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:56:09] RECOVERY - Apache HTTP on mw1042 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.061 second response time [19:56:10] RECOVERY - Apache HTTP on mw1038 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 1.394 second response time [19:56:10] RECOVERY - Apache HTTP on mw1106 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 2.485 second response time [19:56:10] RECOVERY - Apache HTTP on mw1084 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 3.492 second response time [19:56:29] RECOVERY - Apache HTTP on mw1023 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 6.373 second response time [19:56:29] RECOVERY - Apache HTTP on mw1109 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 6.291 second response time [19:56:30] RECOVERY - Apache HTTP on mw1076 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 6.948 second response time [19:56:30] RECOVERY - Apache HTTP on mw1065 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 9.381 second response time [19:56:30] RECOVERY - Apache HTTP on mw1019 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 9.410 second response time [19:56:30] RECOVERY - Apache HTTP on mw1032 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 7.561 second response time [19:56:39] PROBLEM - Apache HTTP on mw1101 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:56:39] PROBLEM - Apache HTTP on mw1039 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:56:39] PROBLEM - Apache HTTP on mw1107 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:56:39] PROBLEM - Apache HTTP on mw1079 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:56:39] PROBLEM - Apache HTTP on mw1077 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:56:40] PROBLEM - Apache HTTP on mw1058 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:56:40] RECOVERY - Apache HTTP on mw1083 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 4.707 second response time [19:56:41] RECOVERY - Apache HTTP on mw1057 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 7.272 second response time [19:56:49] RECOVERY - Apache HTTP on mw1052 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 9.429 second response time [19:56:49] RECOVERY - Apache HTTP on mw1090 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 8.985 second response time [19:57:30] RECOVERY - Apache HTTP on mw1079 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.062 second response time [19:57:30] RECOVERY - Apache HTTP on mw1077 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.340 second response time [19:57:30] RECOVERY - Apache HTTP on mw1058 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.398 second response time [19:57:30] RECOVERY - Apache HTTP on mw1088 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 1.325 second response time [19:57:30] RECOVERY - Apache HTTP on mw1021 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.764 second response time [19:57:30] RECOVERY - Apache HTTP on mw1039 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 1.494 second response time [19:57:30] RECOVERY - Apache HTTP on mw1024 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 1.536 second response time [19:57:31] RECOVERY - Apache HTTP on mw1107 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 3.362 second response time [19:57:31] RECOVERY - Apache HTTP on mw1101 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 4.447 second response time [19:57:32] RECOVERY - Apache HTTP on mw1028 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 4.055 second response time [19:57:32] RECOVERY - Apache HTTP on mw1030 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 7.057 second response time [19:57:33] RECOVERY - Apache HTTP on mw1067 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 5.448 second response time [19:57:33] RECOVERY - Apache HTTP on mw1031 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.058 second response time [19:57:39] RECOVERY - Apache HTTP on mw1026 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.070 second response time [20:10:21] all quiet? :) [20:10:27] * jeremyb gets ready to revert /topic [20:17:20] RECOVERY - check_job_queue on hume is OK: JOBQUEUE OK - all job queues below 10,000 [20:20:30] PROBLEM - check_job_queue on hume is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:28:53] whew, quiet for 20 minutes [20:29:24] did you get some food? [20:37:45] RobH: assuming emergency is over, and given you are in RT duty: https://gerrit.wikimedia.org/r/#/c/78944/ [20:38:20] gerrit thing +1'd by Chad, needed to help new committers [20:49:05] !log reedy synchronized php-1.22wmf16/extensions/MoodBar/ [20:49:08] Logged the message, Master [20:49:23] Sep 12 20:48:45 10.64.0.64 apache2[20992]: [error] [client 10.64.32.107] client denied by server configuration: /usr/local/apache/common/docroot/default/server-status [20:49:23] Sep 12 20:48:45 10.64.16.164 apache2[6112]: [error] [client 10.64.0.103] client denied by server configuration: /usr/local/apache/common/docroot/default/server-status [21:07:58] fyi, just got an HTTP timeout hitting search backend. was foundationwiki. ([[special:search]]) [21:11:46] Hello IRC Wikimedia ! [21:12:09] is there someone around ? I got a "technical" question... [21:12:22] !ask [21:12:22] Hi, how can we help you? Just ask your question. [21:12:31] (03PS3) 10Reedy: Enable CleanChanges extension on Meta-Wiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/81678 (owner: 10Nemo bis) [21:12:33] !ask [21:12:34] Hi, how can we help you? Just ask your question. [21:13:31] Hello wm-bot, I would like to know if there are ".zim" file for the wikisource, as it exists for the Wikipedia. We would like to serve Wikisource locally in our networks (very slow network with a satellite link in the amazonian forest) [21:13:49] (03CR) 10Reedy: [C: 032] Enable CleanChanges extension on Meta-Wiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/81678 (owner: 10Nemo bis) [21:13:59] (03Merged) 10jenkins-bot: Enable CleanChanges extension on Meta-Wiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/81678 (owner: 10Nemo bis) [21:14:01] just one language? [21:14:02] it would save a lot of bandwidth and give user access to a lot of books [21:14:04] yes [21:14:07] and why just wikisource? [21:14:14] fr_FR, and maybe pt_BR [21:14:37] because I already downloaded the wikipedia FR (15GB, a big .zim file that we use with "kiwix") [21:15:09] fr_FR is africa? [21:15:15] i'm assuming france is not on satellite [21:15:26] the idea is to serve public domain books to our users (mainly french / brazilian / teko speakers) [21:15:27] !log reedy synchronized wmf-config/extension-list 'Enable CleanChanges' [21:15:31] Logged the message, Master [21:15:47] we're actually not in france mainland [21:15:49] jeremyb: https://en.wikipedia.org/wiki/File:New-Map-Francophone_World.PNG ;) [21:15:58] but in French Guiana, between Brazil and Surinam, in South America [21:17:12] that's why we can only connect through satellite [21:17:15] MatmaRex: mhhhhhmmmm [21:17:30] gf973: is this sneakernet? [21:17:33] (no ADSL except in a few cities near the coast) [21:17:44] sneakernet ? [21:17:49] gf973: It would look like there isn't wikisource dunmps [21:18:25] err.. [21:19:11] You might want to ty in #kiwix [21:19:16] !log reedy Started syncing Wikimedia installation... : Update l10n cache for CleanChanges [21:19:19] Logged the message, Master [21:19:28] ok, thank you Reedy, I will ask them [21:19:51] I suspect it's somewhat a case of it not having been done [21:20:27] oh, i didn't realize what channel we were in [21:21:02] this discussion really belonged in #wikimedia-tech rather than here (but the redirect to #kiwix is even better) [21:22:00] gf973: sneakernet means you download the zim file somewhere that's not your target location and then carry it physically on some media/drive to wherever it needs to go [21:23:01] yes,btw that's how we proceed, carrying zim file on canoo on the amazonian rivers [21:23:08] :) [21:23:12] (03PS5) 10Hashar: initial debian directory [operations/debs/python-gear] - 10https://gerrit.wikimedia.org/r/84022 [21:23:13] hah, a canoe! [21:23:16] but, for the wikisource I can't find id [21:23:25] maybe with barefeet [21:23:40] Like I say, it's presumably more of a case it hasn't been done, than can't [21:24:01] gf973: you don't seem to have joined #kiwix yet? [21:24:18] /join #kiwix [21:24:18] no, i'm trying [21:24:22] but i arrive on "#" [21:24:24] type that exactly and hit enter [21:24:26] Though, for Wikisource it might need a few updates [21:24:28] I'm on the "web IRC" [21:24:32] i know [21:24:33] and i type /join #kiwix [21:24:34] type what i typed [21:24:40] but I land on "#" [21:24:46] ok [21:25:36] a modified form of sneakernet is drop it in the mail [21:25:51] (03PS1) 10Lcarr: fixing the ntp server ranges [operations/puppet] - 10https://gerrit.wikimedia.org/r/84091 [21:26:11] we do "sneakernet" with conoe, small plane, helicopters here, but no mail service :-) [21:26:13] so, anyone want to confirm that this probably won't kill all the hosts? ^^ paravoid ? [21:26:47] (03CR) 10Mark Bergsma: "We should really be pulling those ranges from network.pp..." [operations/puppet] - 10https://gerrit.wikimedia.org/r/84091 (owner: 10Lcarr) [21:27:42] LeslieCarr: we need some cat pix to go with it [21:27:48] hehe [21:28:17] ah, i should make another commit with fixing up network.pp [21:28:28] LeslieCarr: why do you have a submodule change in there? [21:28:30] !log reedy synchronized wmf-config/ [21:28:33] Logged the message, Master [21:28:35] (03CR) 10Faidon Liambotis: [C: 04-1] "Why the cdh4 submodule update?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/84091 (owner: 10Lcarr) [21:29:19] !log reedy Finished syncing Wikimedia installation... : Update l10n cache for CleanChanges [21:29:22] Logged the message, Master [21:29:23] hrm, need to figure out how to get rid of that [21:29:51] !log reedy synchronized wmf-config/ [21:33:11] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:34:15] it's presumably tracking a branch rather than being set to a commit [21:34:23] git checkout ab1234 [21:34:26] * Reedy looks [21:34:50] (03PS1) 10Lcarr: fixing up the ntp ip list [operations/puppet] - 10https://gerrit.wikimedia.org/r/84092 [21:34:52] (03Abandoned) 10Lcarr: fixing the ntp server ranges [operations/puppet] - 10https://gerrit.wikimedia.org/r/84091 (owner: 10Lcarr) [21:35:20] ok, i failed to make the submodule change remove itself from that :) [21:35:22] LeslieCarr: why abandon? [21:35:24] so patchset #2 [21:35:30] because it was way easier [21:35:38] * jeremyb cries [21:35:39] heh :D [21:35:44] please abandon the new one? [21:35:45] We'll run out of numbers [21:35:49] i'll fix it [21:35:53] okay [21:36:01] only if you tell me the steps you took [21:36:05] so i can do it right next time [21:36:08] (03Abandoned) 10Lcarr: fixing up the ntp ip list [operations/puppet] - 10https://gerrit.wikimedia.org/r/84092 (owner: 10Lcarr) [21:36:31] giving me extra work! :P [21:36:40] cdh4 is empty? [21:37:26] oh, wow, leslie just had an old checkout [21:37:33] ## production...origin/production [ahead 1, behind 108] [21:37:33] it's a git submodule, and git submodules are crazy [21:37:51] sept 5th [21:37:52] running git submodule update --init --recursive changes nothing in it for me [21:37:57] LeslieCarr: git checkout production && git pull [21:37:58] ;) [21:40:49] LeslieCarr: i don't have a restore/unabandon button [21:40:54] for the original [21:41:17] Just amend and push [21:41:19] IIRC it works [21:41:22] (03Restored) 10Lcarr: fixing the ntp server ranges [operations/puppet] - 10https://gerrit.wikimedia.org/r/84091 (owner: 10Lcarr) [21:41:23] there we go [21:41:26] unabandoned [21:41:26] Reedy: not if it's abandoned [21:41:32] (03PS2) 10Jeremyb: fixing up the ntp ip list [operations/puppet] - 10https://gerrit.wikimedia.org/r/84091 (owner: 10Lcarr) [21:41:40] Amend & push doesn't work for abandoned changes, at least it didn't use to [21:41:49] ok [21:42:33] <^demon> Abandoned & merged can't be amended. They're considered closed by gerrit. [21:42:35] anyway, the fix is: do whatever you did to generate the new one but use the *old* change-id line in the commit msg. then `git push origin HEAD:refs/publish/production` (or maybe something else would work but that push cmd is my standard invocation) [21:42:39] LeslieCarr: ^ [21:42:59] <^demon> jeremyb: refs/for/*, less typing :) [21:43:08] ^demon: that's deprecated, no? [21:43:18] anyway, yes i knew that :) [21:43:32] <^demon> They un-deprecated because nobody was changing docs and nobody really knew why it was being changed to begin with :) [21:43:34] yes, "closed" does seem to be the term: ! [remote rejected] HEAD -> refs/publish/production (change 84091 closed) [21:43:35] ok [21:43:40] hah [21:43:40] thanks [21:43:51] to differentiate vs. drafts i assumed [21:43:57] LeslieCarr: do you follow? [21:44:14] yeah [21:44:17] <^demon> jeremyb: Yeah that was the original plan. [21:44:40] LeslieCarr: so i did git reset --hard new && git commit -a --amend && change the change-id && save && push [21:45:12] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 4.012 second response time [21:48:49] ok, so, will this take everything down paravoid? ;) [21:48:56] i wink, but i am also scared [21:50:15] oh, do i get to share the blame if it breaks now? :) [21:50:23] !log upgrading to 11.4r9 on cr1-ulsfo/cr2-ulsfo [21:50:28] Logged the message, Mistress of the network gear. [21:50:29] sure! [21:51:10] no, that won't take everything down, especially since it only changes on the ntp servers [21:52:07] ok, and yes, let me open a ticket ot have that actually draw from network.pp [22:01:23] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:05:30] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 1.317 second response time [22:09:14] (03PS1) 10Lcarr: added all the new network subnets [operations/puppet] - 10https://gerrit.wikimedia.org/r/84096 [22:10:32] (03PS2) 10Lcarr: added all the new network subnets [operations/puppet] - 10https://gerrit.wikimedia.org/r/84096 [22:13:02] (03PS3) 10Lcarr: added all the new network subnets [operations/puppet] - 10https://gerrit.wikimedia.org/r/84096 [22:13:38] (03CR) 10Lcarr: [C: 032] added all the new network subnets [operations/puppet] - 10https://gerrit.wikimedia.org/r/84096 (owner: 10Lcarr) [22:14:43] that's wrong leslie... [22:15:06] (03CR) 10Mark Bergsma: [C: 04-2] "private network under public section..." [operations/puppet] - 10https://gerrit.wikimedia.org/r/84096 (owner: 10Lcarr) [22:15:42] heh, yes it is [22:20:01] (03PS1) 10Hashar: missing A entries for LVS ethifaces 10.64.17.1-6 [operations/dns] - 10https://gerrit.wikimedia.org/r/84098 [22:20:15] (03PS4) 10Lcarr: added all the new network subnets [operations/puppet] - 10https://gerrit.wikimedia.org/r/84096 [22:20:31] (03CR) 10jenkins-bot: [V: 04-1] added all the new network subnets [operations/puppet] - 10https://gerrit.wikimedia.org/r/84096 (owner: 10Lcarr) [22:21:02] ok, where was the comma i forgot [22:22:08] (03PS1) 10Ottomata: Installing tools-log4j.properties in Makefile. [operations/debs/kafka] (debian) - 10https://gerrit.wikimedia.org/r/84099 [22:22:25] (03CR) 10Ottomata: [C: 032 V: 032] Installing tools-log4j.properties in Makefile. [operations/debs/kafka] (debian) - 10https://gerrit.wikimedia.org/r/84099 (owner: 10Ottomata) [22:24:30] (03PS5) 10Lcarr: added all the new network subnets [operations/puppet] - 10https://gerrit.wikimedia.org/r/84096 [22:25:41] (03CR) 10Lcarr: [C: 032] added all the new network subnets [operations/puppet] - 10https://gerrit.wikimedia.org/r/84096 (owner: 10Lcarr) [22:26:34] (03CR) 10Hashar: "Can be build using:" [operations/debs/python-gear] - 10https://gerrit.wikimedia.org/r/84022 (owner: 10Hashar) [22:26:43] *built [22:28:17] ori-l: thank you for paying attention :-] [22:28:36] ori-l: I should attend your "english as a second language" course. [22:28:43] might well take some lessons myself [22:29:33] i wasn't very good at it -- i got good reviews by being incredibly lenient with homework :P [22:31:11] (03CR) 10Hashar: "I can provide any .deb package because the dh_builddeb helper dies out under git buildpackage:" [operations/debs/python-gear] - 10https://gerrit.wikimedia.org/r/84022 (owner: 10Hashar) [22:43:20] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:46:13] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 1.906 second response time [23:09:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:10:25] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time [23:16:13] (03PS1) 10Ricordisamoa: Revert "Enable CleanChanges extension on Meta-Wiki" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/84102 [23:20:21] (03PS2) 10Yuvipanda: Add mosh to bastion hosts [operations/puppet] - 10https://gerrit.wikimedia.org/r/84024 [23:21:43] (03PS3) 10coren: Add mosh to bastion hosts [operations/puppet] - 10https://gerrit.wikimedia.org/r/84024 (owner: 10Yuvipanda) [23:22:35] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:23:20] (03CR) 10coren: [C: 032] "LGTM" [operations/puppet] - 10https://gerrit.wikimedia.org/r/84024 (owner: 10Yuvipanda) [23:24:30] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.124 second response time [23:27:36] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:28:25] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.708 second response time [23:33:36] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:35:27] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 3.738 second response time [23:36:23] (03PS3) 10Ottomata: Installing java openjdk 7 on analytics nodes. [operations/puppet] - 10https://gerrit.wikimedia.org/r/82430 [23:36:36] (03CR) 10Ottomata: [C: 032 V: 032] Installing java openjdk 7 on analytics nodes. [operations/puppet] - 10https://gerrit.wikimedia.org/r/82430 (owner: 10Ottomata) [23:39:26] (03PS1) 10Yuvipanda: Remove redundant operatingsystem check for Ubuntu [operations/puppet] - 10https://gerrit.wikimedia.org/r/84104 [23:39:44] (03CR) 10jenkins-bot: [V: 04-1] Remove redundant operatingsystem check for Ubuntu [operations/puppet] - 10https://gerrit.wikimedia.org/r/84104 (owner: 10Yuvipanda) [23:40:38] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:40:55] (03PS2) 10Yuvipanda: Remove redundant operatingsystem check for Ubuntu [operations/puppet] - 10https://gerrit.wikimedia.org/r/84104 [23:41:36] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 9.798 second response time [23:45:45] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:46:48] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 8.650 second response time [23:46:55] PROBLEM - DPKG on analytics1019 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [23:47:55] RECOVERY - DPKG on analytics1019 is OK: All packages OK [23:49:55] PROBLEM - DPKG on analytics1013 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [23:50:55] RECOVERY - DPKG on analytics1013 is OK: All packages OK [23:51:25] PROBLEM - DPKG on analytics1008 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [23:52:26] RECOVERY - DPKG on analytics1008 is OK: All packages OK [23:59:45] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds