[00:05:36] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:19:33] PROBLEM - NTP on search26 is CRITICAL: NTP CRITICAL: Offset -1.00406301 secs [00:19:42] PROBLEM - NTP on search17 is CRITICAL: NTP CRITICAL: Offset unknown [00:19:51] PROBLEM - NTP on search33 is CRITICAL: NTP CRITICAL: Offset -1.002327919 secs [00:19:51] PROBLEM - NTP on search18 is CRITICAL: NTP CRITICAL: Offset -1.003461003 secs [00:19:51] PROBLEM - NTP on search15 is CRITICAL: NTP CRITICAL: Offset -1.002972484 secs [00:19:51] PROBLEM - NTP on vanadium is CRITICAL: NTP CRITICAL: Offset -1.001446962 secs [00:19:51] PROBLEM - NTP on search22 is CRITICAL: NTP CRITICAL: Offset -1.002938747 secs [00:19:52] PROBLEM - NTP on search24 is CRITICAL: NTP CRITICAL: Offset -1.004598022 secs [00:19:52] PROBLEM - NTP on stat1 is CRITICAL: NTP CRITICAL: Offset -1.000285387 secs [00:20:00] PROBLEM - NTP on search27 is CRITICAL: NTP CRITICAL: Offset -1.002937198 secs [00:20:00] PROBLEM - NTP on search14 is CRITICAL: NTP CRITICAL: Offset -1.003344297 secs [00:20:00] PROBLEM - NTP on search35 is CRITICAL: NTP CRITICAL: Offset -1.003415585 secs [00:20:00] PROBLEM - NTP on db63 is CRITICAL: NTP CRITICAL: Offset unknown [00:20:00] PROBLEM - NTP on sockpuppet is CRITICAL: NTP CRITICAL: Offset -1.002735853 secs [00:20:01] PROBLEM - NTP on search23 is CRITICAL: NTP CRITICAL: Offset -1.009204268 secs [00:20:09] PROBLEM - NTP on ms10 is CRITICAL: NTP CRITICAL: Offset -1.002880216 secs [00:20:09] PROBLEM - NTP on virt1004 is CRITICAL: NTP CRITICAL: Offset -1.00008738 secs [00:20:09] PROBLEM - NTP on cp1041 is CRITICAL: NTP CRITICAL: Offset -1.001003861 secs [00:20:09] PROBLEM - NTP on search25 is CRITICAL: NTP CRITICAL: Offset -1.004829764 secs [00:20:09] PROBLEM - NTP on nitrogen is CRITICAL: NTP CRITICAL: Offset -1.00122726 secs [00:20:18] PROBLEM - NTP on search28 is CRITICAL: NTP CRITICAL: Offset -1.002691388 secs [00:20:18] PROBLEM - NTP on search36 is CRITICAL: NTP CRITICAL: Offset -1.003834605 secs [00:20:18] PROBLEM - NTP on search21 is CRITICAL: NTP CRITICAL: Offset -1.003881931 secs [00:20:27] PROBLEM - NTP on searchidx2 is CRITICAL: NTP CRITICAL: Offset -1.003004909 secs [00:20:36] PROBLEM - NTP on search31 is CRITICAL: NTP CRITICAL: Offset -1.001902938 secs [00:20:36] PROBLEM - NTP on virt1007 is CRITICAL: NTP CRITICAL: Offset -1.002343535 secs [00:20:36] PROBLEM - NTP on hydrogen is CRITICAL: NTP CRITICAL: Offset -1.002717495 secs [00:20:36] PROBLEM - NTP on virt1008 is CRITICAL: NTP CRITICAL: Offset -1.001419187 secs [00:20:36] PROBLEM - NTP on srv281 is CRITICAL: NTP CRITICAL: Offset -1.003268719 secs [00:20:37] PROBLEM - NTP on stafford is CRITICAL: NTP CRITICAL: Offset -1.00297296 secs [00:20:37] PROBLEM - NTP on search29 is CRITICAL: NTP CRITICAL: Offset -1.003409266 secs [00:20:38] PROBLEM - NTP on ms-be1002 is CRITICAL: NTP CRITICAL: Offset -1.001787305 secs [00:20:38] PROBLEM - NTP on ms-be1001 is CRITICAL: NTP CRITICAL: Offset -1.001975775 secs [00:20:39] PROBLEM - NTP on virt1003 is CRITICAL: NTP CRITICAL: Offset -1.000443697 secs [00:20:39] PROBLEM - NTP on search13 is CRITICAL: NTP CRITICAL: Offset -1.002311707 secs [00:20:45] PROBLEM - NTP on search34 is CRITICAL: NTP CRITICAL: Offset unknown [00:20:45] PROBLEM - NTP on virt1005 is CRITICAL: NTP CRITICAL: Offset -1.003082275 secs [00:20:54] PROBLEM - NTP on search20 is CRITICAL: NTP CRITICAL: Offset -1.003569365 secs [00:20:54] PROBLEM - NTP on manutius is CRITICAL: NTP CRITICAL: Offset -1.003699183 secs [00:20:54] PROBLEM - NTP on neon is CRITICAL: NTP CRITICAL: Offset -1.000853658 secs [00:20:54] PROBLEM - NTP on virt1002 is CRITICAL: NTP CRITICAL: Offset -1.003281355 secs [00:20:54] PROBLEM - NTP on search19 is CRITICAL: NTP CRITICAL: Offset -1.003387332 secs [00:21:03] PROBLEM - NTP on search16 is CRITICAL: NTP CRITICAL: Offset -1.002802253 secs [00:21:03] PROBLEM - NTP on cp1042 is CRITICAL: NTP CRITICAL: Offset -1.001209378 secs [00:21:03] PROBLEM - NTP on search30 is CRITICAL: NTP CRITICAL: Offset -1.00454247 secs [00:21:03] PROBLEM - NTP on chromium is CRITICAL: NTP CRITICAL: Offset -1.002126813 secs [00:21:03] PROBLEM - NTP on db1045 is CRITICAL: NTP CRITICAL: Offset -1.001263499 secs [00:21:12] PROBLEM - NTP on virt1001 is CRITICAL: NTP CRITICAL: Offset -1.001881003 secs [00:21:21] PROBLEM - NTP on capella is CRITICAL: NTP CRITICAL: Offset -1.00713253 secs [00:21:48] PROBLEM - NTP on ms-be1006 is CRITICAL: NTP CRITICAL: Offset -1.000199318 secs [00:24:21] PROBLEM - NTP on virt1004 is CRITICAL: NTP CRITICAL: Offset -1.001159787 secs [00:25:33] RECOVERY - NTP on db63 is OK: NTP OK: Offset -0.003840208054 secs [00:26:09] PROBLEM - NTP on ms-be1006 is CRITICAL: NTP CRITICAL: Offset -1.001355648 secs [00:31:06] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 4.830 seconds [00:31:51] PROBLEM - NTP on ms-be1006 is CRITICAL: NTP CRITICAL: Offset -1.00000453 secs [00:32:51] New review: Krinkle; "Can we do that in our config, or does it have to happen upstream?" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/16841 [00:33:10] !log leap second event plus one month is causing an apparent 1s step in time reported by linne/dobson as seen by some clients, causing nagios errors etc. Will step. [00:33:19] Logged the message, Master [00:35:00] RECOVERY - NTP on search20 is OK: NTP OK: Offset 0.002711892128 secs [00:35:09] RECOVERY - NTP on search15 is OK: NTP OK: Offset -0.0005422830582 secs [00:35:09] PROBLEM - NTP on stat1 is CRITICAL: NTP CRITICAL: Offset -1.000931382 secs [00:35:18] RECOVERY - NTP on search18 is OK: NTP OK: Offset 0.002357125282 secs [00:35:27] RECOVERY - NTP on search14 is OK: NTP OK: Offset 0.002243876457 secs [00:35:45] RECOVERY - NTP on searchidx2 is OK: NTP OK: Offset 0.002533197403 secs [00:37:28] New review: Demon; "I'm fine with merging it--ask an opsen to do the honors :)" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/16841 [00:38:00] PROBLEM - NTP on stat1 is CRITICAL: NTP CRITICAL: Offset -1.004558086 secs [00:38:54] RECOVERY - NTP on search34 is OK: NTP OK: Offset -0.0005459785461 secs [00:39:03] RECOVERY - NTP on search26 is OK: NTP OK: Offset 0.0005159378052 secs [00:39:03] RECOVERY - NTP on ms-be1001 is OK: NTP OK: Offset 0.003136396408 secs [00:39:12] RECOVERY - NTP on manutius is OK: NTP OK: Offset 0.002767443657 secs [00:39:12] RECOVERY - NTP on search30 is OK: NTP OK: Offset 0.002802610397 secs [00:39:12] RECOVERY - NTP on search17 is OK: NTP OK: Offset -0.001473665237 secs [00:39:21] RECOVERY - NTP on cp1042 is OK: NTP OK: Offset 0.003164887428 secs [00:39:30] RECOVERY - NTP on search33 is OK: NTP OK: Offset 0.002197980881 secs [00:39:30] RECOVERY - NTP on neon is OK: NTP OK: Offset 0.003155469894 secs [00:39:30] RECOVERY - NTP on virt1001 is OK: NTP OK: Offset 0.003191590309 secs [00:39:30] RECOVERY - NTP on search24 is OK: NTP OK: Offset 0.002688527107 secs [00:39:39] RECOVERY - NTP on ms10 is OK: NTP OK: Offset 5.280971527e-05 secs [00:39:39] RECOVERY - NTP on search25 is OK: NTP OK: Offset 0.002616643906 secs [00:39:48] RECOVERY - NTP on cp1041 is OK: NTP OK: Offset 0.003150343895 secs [00:39:57] RECOVERY - NTP on sockpuppet is OK: NTP OK: Offset 0.002510428429 secs [00:40:15] RECOVERY - NTP on hydrogen is OK: NTP OK: Offset 0.003090023994 secs [00:40:24] RECOVERY - NTP on ms-be1002 is OK: NTP OK: Offset 0.00305891037 secs [00:41:09] PROBLEM - NTP on virt1004 is CRITICAL: NTP CRITICAL: Offset unknown [00:41:18] RECOVERY - NTP on search23 is OK: NTP OK: Offset -0.002405881882 secs [00:41:18] RECOVERY - NTP on search21 is OK: NTP OK: Offset -0.001321434975 secs [00:42:57] RECOVERY - NTP on search13 is OK: NTP OK: Offset -0.00150513649 secs [00:42:57] RECOVERY - NTP on virt1008 is OK: NTP OK: Offset 0.003241539001 secs [00:43:06] RECOVERY - NTP on virt1007 is OK: NTP OK: Offset 0.003159165382 secs [00:44:36] RECOVERY - NTP on search16 is OK: NTP OK: Offset -0.002539992332 secs [00:46:06] RECOVERY - NTP on virt1003 is OK: NTP OK: Offset -0.0004901885986 secs [00:47:00] RECOVERY - NTP on search36 is OK: NTP OK: Offset -0.001963496208 secs [00:47:00] RECOVERY - NTP on search28 is OK: NTP OK: Offset 0.004057884216 secs [00:48:30] RECOVERY - NTP on srv281 is OK: NTP OK: Offset -0.0007045269012 secs [00:48:48] RECOVERY - NTP on search19 is OK: NTP OK: Offset -0.002653121948 secs [00:48:57] RECOVERY - NTP on search29 is OK: NTP OK: Offset 0.000510931015 secs [00:49:51] RECOVERY - NTP on search31 is OK: NTP OK: Offset -0.001265764236 secs [00:50:09] RECOVERY - NTP on stafford is OK: NTP OK: Offset -0.002719640732 secs [00:52:15] RECOVERY - NTP on virt1004 is OK: NTP OK: Offset -0.0001796483994 secs [00:53:27] RECOVERY - NTP on db1045 is OK: NTP OK: Offset -0.0007469654083 secs [00:53:45] RECOVERY - NTP on search22 is OK: NTP OK: Offset -0.001126885414 secs [00:54:03] RECOVERY - NTP on nitrogen is OK: NTP OK: Offset -0.00114107132 secs [00:56:00] RECOVERY - NTP on virt1002 is OK: NTP OK: Offset -0.002690076828 secs [00:56:09] RECOVERY - NTP on chromium is OK: NTP OK: Offset -0.001908540726 secs [00:56:18] RECOVERY - NTP on search27 is OK: NTP OK: Offset 0.003482937813 secs [00:56:27] RECOVERY - NTP on search35 is OK: NTP OK: Offset -0.006003856659 secs [00:57:21] PROBLEM - NTP on search20 is CRITICAL: NTP CRITICAL: Offset unknown [00:57:21] RECOVERY - NTP on virt1005 is OK: NTP OK: Offset -0.001603364944 secs [00:59:09] RECOVERY - NTP on capella is OK: NTP OK: Offset -0.001192092896 secs [01:01:15] PROBLEM - NTP on search26 is CRITICAL: NTP CRITICAL: Offset unknown [01:01:24] PROBLEM - NTP on hydrogen is CRITICAL: NTP CRITICAL: Offset unknown [01:01:42] PROBLEM - NTP on search15 is CRITICAL: NTP CRITICAL: Offset unknown [01:02:37] RECOVERY - NTP on search26 is OK: NTP OK: Offset 0.002385258675 secs [01:02:54] RECOVERY - NTP on search20 is OK: NTP OK: Offset 0.003555178642 secs [01:03:12] RECOVERY - NTP on search15 is OK: NTP OK: Offset 0.00204539299 secs [01:04:24] RECOVERY - NTP on stat1 is OK: NTP OK: Offset 0.0004841089249 secs [01:04:51] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:06:12] PROBLEM - NTP on virt1001 is CRITICAL: NTP CRITICAL: Offset unknown [01:08:18] PROBLEM - NTP on manutius is CRITICAL: NTP CRITICAL: Offset unknown [01:14:45] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.036 seconds [01:18:21] RECOVERY - NTP on ms-be1006 is OK: NTP OK: Offset -1.037120819e-05 secs [01:20:54] RECOVERY - NTP on hydrogen is OK: NTP OK: Offset -0.0009303092957 secs [01:24:39] RECOVERY - NTP on vanadium is OK: NTP OK: Offset -0.0007821321487 secs [01:29:27] RECOVERY - NTP on manutius is OK: NTP OK: Offset -0.0008246898651 secs [01:42:12] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 263 seconds [01:42:48] PROBLEM - MySQL Slave Delay on storage3 is CRITICAL: CRIT replication delay 300 seconds [01:43:06] RECOVERY - NTP on virt1001 is OK: NTP OK: Offset -0.0001261234283 secs [01:44:07] New patchset: Catrope; "Fix the import sources on the Wikimania wikis" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/17158 [01:48:39] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:48:57] PROBLEM - Misc_Db_Lag on storage3 is CRITICAL: CHECK MySQL REPLICATION - lag - CRITICAL - Seconds_Behind_Master : 671s [01:53:27] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 6 seconds [01:54:39] RECOVERY - Misc_Db_Lag on storage3 is OK: CHECK MySQL REPLICATION - lag - OK - Seconds_Behind_Master : 18s [01:55:15] RECOVERY - MySQL Slave Delay on storage3 is OK: OK replication delay 10 seconds [01:57:03] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 6.408 seconds [02:30:57] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:37:06] PROBLEM - Puppet freshness on srv281 is CRITICAL: Puppet has not run in the last 10 hours [02:39:21] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.049 seconds [03:03:57] RECOVERY - swift-account-replicator on ms-be1006 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-replicator [03:21:03] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [03:46:07] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [03:54:57] TimStarling: no, scap's in puppet:files/misc/scripts or something like that [03:56:01] but not scap-1 [03:57:57] http://www.mediawiki.org/wiki/Special:Code/MediaWiki/115635 [03:57:59] oh, i need to go reread scrollback for a 3rd time [06:20:09] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [06:27:03] PROBLEM - Puppet freshness on nickel is CRITICAL: Puppet has not run in the last 10 hours [06:29:09] PROBLEM - Puppet freshness on hooper is CRITICAL: Puppet has not run in the last 10 hours [06:30:03] PROBLEM - Puppet freshness on virt4 is CRITICAL: Puppet has not run in the last 10 hours [06:30:03] PROBLEM - Puppet freshness on ssl1 is CRITICAL: Puppet has not run in the last 10 hours [06:31:06] PROBLEM - Puppet freshness on nfs2 is CRITICAL: Puppet has not run in the last 10 hours [06:31:06] PROBLEM - Puppet freshness on ssl1003 is CRITICAL: Puppet has not run in the last 10 hours [06:31:06] PROBLEM - Puppet freshness on ssl1001 is CRITICAL: Puppet has not run in the last 10 hours [06:31:06] PROBLEM - Puppet freshness on ssl3001 is CRITICAL: Puppet has not run in the last 10 hours [06:31:06] PROBLEM - Puppet freshness on williams is CRITICAL: Puppet has not run in the last 10 hours [06:32:09] PROBLEM - Puppet freshness on grosley is CRITICAL: Puppet has not run in the last 10 hours [06:32:09] PROBLEM - Puppet freshness on formey is CRITICAL: Puppet has not run in the last 10 hours [06:32:09] PROBLEM - Puppet freshness on sodium is CRITICAL: Puppet has not run in the last 10 hours [06:32:09] PROBLEM - Puppet freshness on kaulen is CRITICAL: Puppet has not run in the last 10 hours [06:32:09] PROBLEM - Puppet freshness on virt1 is CRITICAL: Puppet has not run in the last 10 hours [06:32:10] PROBLEM - Puppet freshness on virt2 is CRITICAL: Puppet has not run in the last 10 hours [06:32:10] PROBLEM - Puppet freshness on virt3 is CRITICAL: Puppet has not run in the last 10 hours [06:32:11] PROBLEM - Puppet freshness on ssl4 is CRITICAL: Puppet has not run in the last 10 hours [06:33:03] PROBLEM - Puppet freshness on aluminium is CRITICAL: Puppet has not run in the last 10 hours [06:33:03] PROBLEM - Puppet freshness on gallium is CRITICAL: Puppet has not run in the last 10 hours [06:33:03] PROBLEM - Puppet freshness on manganese is CRITICAL: Puppet has not run in the last 10 hours [06:33:03] PROBLEM - Puppet freshness on sanger is CRITICAL: Puppet has not run in the last 10 hours [06:33:03] PROBLEM - Puppet freshness on virt7 is CRITICAL: Puppet has not run in the last 10 hours [06:34:07] PROBLEM - Puppet freshness on marmontel is CRITICAL: Puppet has not run in the last 10 hours [06:34:07] PROBLEM - Puppet freshness on ssl1002 is CRITICAL: Puppet has not run in the last 10 hours [06:34:07] PROBLEM - Puppet freshness on ssl1004 is CRITICAL: Puppet has not run in the last 10 hours [06:35:09] PROBLEM - Puppet freshness on ekrem is CRITICAL: Puppet has not run in the last 10 hours [06:35:09] PROBLEM - Puppet freshness on fenari is CRITICAL: Puppet has not run in the last 10 hours [06:35:09] PROBLEM - Puppet freshness on nfs1 is CRITICAL: Puppet has not run in the last 10 hours [06:35:09] PROBLEM - Puppet freshness on ssl3 is CRITICAL: Puppet has not run in the last 10 hours [06:35:09] PROBLEM - Puppet freshness on ssl3002 is CRITICAL: Puppet has not run in the last 10 hours [06:36:03] PROBLEM - Puppet freshness on ssl2 is CRITICAL: Puppet has not run in the last 10 hours [06:38:55] PROBLEM - Puppet freshness on argon is CRITICAL: Puppet has not run in the last 10 hours [06:38:55] PROBLEM - Puppet freshness on virt6 is CRITICAL: Puppet has not run in the last 10 hours [06:38:55] PROBLEM - Puppet freshness on virt1000 is CRITICAL: Puppet has not run in the last 10 hours [06:40:07] PROBLEM - Puppet freshness on virt8 is CRITICAL: Puppet has not run in the last 10 hours [06:42:04] PROBLEM - Puppet freshness on virt5 is CRITICAL: Puppet has not run in the last 10 hours [06:42:04] PROBLEM - Puppet freshness on ssl3003 is CRITICAL: Puppet has not run in the last 10 hours [06:46:07] PROBLEM - Puppet freshness on virt0 is CRITICAL: Puppet has not run in the last 10 hours [06:46:07] PROBLEM - Puppet freshness on singer is CRITICAL: Puppet has not run in the last 10 hours [06:58:07] PROBLEM - Puppet freshness on ms-be10 is CRITICAL: Puppet has not run in the last 10 hours [07:19:07] PROBLEM - Puppet freshness on calcium is CRITICAL: Puppet has not run in the last 10 hours [08:57:31] RECOVERY - check_job_queue on spence is OK: JOBQUEUE OK - all job queues below 10,000 [08:58:07] RECOVERY - check_job_queue on neon is OK: JOBQUEUE OK - all job queues below 10,000 [10:31:25] PROBLEM - NTP peers on linne is CRITICAL: NTP CRITICAL: Server not synchronized, Offset unknown [10:34:07] RECOVERY - NTP peers on linne is OK: NTP OK: Offset 0.000828 secs [10:35:19] PROBLEM - NTP peers on dobson is CRITICAL: NTP CRITICAL: Server not synchronized, Offset unknown [10:36:40] RECOVERY - NTP peers on dobson is OK: NTP OK: Offset -0.0009 secs [11:51:42] New patchset: Mark Bergsma; "Restart the NTP client if hit by the leap second bug" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/17176 [11:52:21] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/17176 [11:52:53] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/17176 [11:56:10] PROBLEM - Puppet freshness on analytics1006 is CRITICAL: Puppet has not run in the last 10 hours [11:58:07] PROBLEM - Puppet freshness on cp1032 is CRITICAL: Puppet has not run in the last 10 hours [12:38:10] PROBLEM - Puppet freshness on srv281 is CRITICAL: Puppet has not run in the last 10 hours [12:59:22] <^demon> jeremyb: I fixed All-Projects for you yesterday :) [13:22:07] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [13:33:37] Change abandoned: Demon; "Not really as necessary as it was before the database move--not going to bother dealing with this." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/13356 [13:47:10] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [13:51:43] New patchset: Hashar; "basic README introducing our files" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/16035 [13:53:49] New patchset: Hashar; "basic README introducing our files" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/16035 [13:54:03] Change merged: Hashar; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/16035 [14:24:08] ^demon: woot, danke [14:58:00] New patchset: Demon; "Overhauling gerrit manifest to be a role class" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/13484 [14:58:38] New review: gerrit2; "Change did not pass lint check. You will need to send an amended patchset for this (see: https://lab..." [operations/puppet] (production); V: -1 - https://gerrit.wikimedia.org/r/13484 [15:06:50] I keep turning puppet off on virt1002 (service puppet stop) but whenever I return to it after leaving it unattended I find puppet running again. [15:07:05] Is there some external service that goes through a list of servers and ensures that puppet is always up? [15:07:31] andrewbogott: cron [15:07:41] Oh, of course. [15:07:49] * andrewbogott breaks cron [15:07:59] andrewbogott: try START=no @ /etc/default/puppet [15:08:01] New patchset: Demon; "Overhauling gerrit manifest to be a role class" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/13484 [15:08:39] New review: gerrit2; "Change did not pass lint check. You will need to send an amended patchset for this (see: https://lab..." [operations/puppet] (production); V: -1 - https://gerrit.wikimedia.org/r/13484 [15:09:55] New patchset: Demon; "Overhauling gerrit manifest to be a role class" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/13484 [15:10:32] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/13484 [15:10:36] whoo [15:43:21] New patchset: Demon; "Overhauling gerrit manifest to be a role class" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/13484 [15:43:55] New review: gerrit2; "Change did not pass lint check. You will need to send an amended patchset for this (see: https://lab..." [operations/puppet] (production); V: -1 - https://gerrit.wikimedia.org/r/13484 [15:44:26] New patchset: Demon; "Overhauling gerrit manifest to be a role class" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/13484 [15:44:44] hehe [15:45:04] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/13484 [16:05:10] New patchset: Demon; "Overhauling gerrit manifest to be a role class" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/13484 [16:05:44] New review: gerrit2; "Change did not pass lint check. You will need to send an amended patchset for this (see: https://lab..." [operations/puppet] (production); V: -1 - https://gerrit.wikimedia.org/r/13484 [16:06:21] New patchset: Demon; "Overhauling gerrit manifest to be a role class" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/13484 [16:06:59] New review: gerrit2; "Change did not pass lint check. You will need to send an amended patchset for this (see: https://lab..." [operations/puppet] (production); V: -1 - https://gerrit.wikimedia.org/r/13484 [16:08:05] New patchset: Demon; "Overhauling gerrit manifest to be a role class" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/13484 [16:08:42] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/13484 [16:21:04] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [16:28:08] PROBLEM - Puppet freshness on nickel is CRITICAL: Puppet has not run in the last 10 hours [16:28:49] New patchset: Reedy; "Bug 38905 - ShortUrl does not work on non wikipedia projects" [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/17191 [16:30:04] PROBLEM - Puppet freshness on hooper is CRITICAL: Puppet has not run in the last 10 hours [16:30:12] New patchset: Reedy; "Bug 38905 - ShortUrl does not work on non wikipedia projects" [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/17191 [16:30:35] maplebed: ^ [16:31:07] PROBLEM - Puppet freshness on ssl1 is CRITICAL: Puppet has not run in the last 10 hours [16:31:07] PROBLEM - Puppet freshness on virt4 is CRITICAL: Puppet has not run in the last 10 hours [16:32:10] PROBLEM - Puppet freshness on nfs2 is CRITICAL: Puppet has not run in the last 10 hours [16:32:10] PROBLEM - Puppet freshness on ssl1003 is CRITICAL: Puppet has not run in the last 10 hours [16:32:10] PROBLEM - Puppet freshness on ssl1001 is CRITICAL: Puppet has not run in the last 10 hours [16:32:10] PROBLEM - Puppet freshness on ssl3001 is CRITICAL: Puppet has not run in the last 10 hours [16:32:10] PROBLEM - Puppet freshness on williams is CRITICAL: Puppet has not run in the last 10 hours [16:32:27] Reedy: do you have merge and push rights on that repo? I +1ed it. [16:33:04] PROBLEM - Puppet freshness on formey is CRITICAL: Puppet has not run in the last 10 hours [16:33:04] PROBLEM - Puppet freshness on grosley is CRITICAL: Puppet has not run in the last 10 hours [16:33:04] PROBLEM - Puppet freshness on kaulen is CRITICAL: Puppet has not run in the last 10 hours [16:33:04] PROBLEM - Puppet freshness on sodium is CRITICAL: Puppet has not run in the last 10 hours [16:33:04] PROBLEM - Puppet freshness on virt3 is CRITICAL: Puppet has not run in the last 10 hours [16:33:05] PROBLEM - Puppet freshness on virt2 is CRITICAL: Puppet has not run in the last 10 hours [16:33:05] PROBLEM - Puppet freshness on virt1 is CRITICAL: Puppet has not run in the last 10 hours [16:33:06] PROBLEM - Puppet freshness on ssl4 is CRITICAL: Puppet has not run in the last 10 hours [16:34:07] PROBLEM - Puppet freshness on aluminium is CRITICAL: Puppet has not run in the last 10 hours [16:34:07] PROBLEM - Puppet freshness on gallium is CRITICAL: Puppet has not run in the last 10 hours [16:34:07] PROBLEM - Puppet freshness on manganese is CRITICAL: Puppet has not run in the last 10 hours [16:34:07] PROBLEM - Puppet freshness on virt7 is CRITICAL: Puppet has not run in the last 10 hours [16:34:07] PROBLEM - Puppet freshness on sanger is CRITICAL: Puppet has not run in the last 10 hours [16:35:10] PROBLEM - Puppet freshness on marmontel is CRITICAL: Puppet has not run in the last 10 hours [16:35:10] PROBLEM - Puppet freshness on ssl1002 is CRITICAL: Puppet has not run in the last 10 hours [16:35:10] PROBLEM - Puppet freshness on ssl1004 is CRITICAL: Puppet has not run in the last 10 hours [16:35:41] New patchset: Demon; "Overhauling gerrit manifest to be a role class" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/13484 [16:36:04] PROBLEM - Puppet freshness on ekrem is CRITICAL: Puppet has not run in the last 10 hours [16:36:04] PROBLEM - Puppet freshness on fenari is CRITICAL: Puppet has not run in the last 10 hours [16:36:04] PROBLEM - Puppet freshness on ssl3 is CRITICAL: Puppet has not run in the last 10 hours [16:36:04] PROBLEM - Puppet freshness on ssl3002 is CRITICAL: Puppet has not run in the last 10 hours [16:36:04] PROBLEM - Puppet freshness on nfs1 is CRITICAL: Puppet has not run in the last 10 hours [16:36:20] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/13484 [16:37:07] PROBLEM - Puppet freshness on ssl2 is CRITICAL: Puppet has not run in the last 10 hours [16:37:33] RobHalsell: could you update racktables for ms-be1008 and ms-be1012? [16:37:40] yep, sorry about that [16:37:54] New patchset: Mark Bergsma; "Revert "certs: use c_rehash instead of manually symlinking"" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/17193 [16:37:57] np, I just don't want to confuse future swift ring manipulators. [16:38:37] New patchset: Mark Bergsma; "Revert "Follow up to change 17065, adding the rapidssl ca source back in"" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/17194 [16:39:04] PROBLEM - Puppet freshness on ms-be1005 is CRITICAL: Puppet has not run in the last 10 hours [16:39:04] PROBLEM - Puppet freshness on ms-be1006 is CRITICAL: Puppet has not run in the last 10 hours [16:39:04] PROBLEM - Puppet freshness on ms-be1009 is CRITICAL: Puppet has not run in the last 10 hours [16:39:10] maplebed: fixed [16:39:11] mark: it was broken but I was told that someone immediately fixed it [16:39:19] after merging it [16:39:20] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/17193 [16:39:20] New review: Mark Bergsma; "Puppet is currently broken, feel free to remerge after fixing" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/17193 [16:39:20] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/17194 [16:39:23] jeremyb: wasn't that the case? [16:39:39] jeremyb: I think you gave it the -1 initially and you told me so [16:39:43] New review: Mark Bergsma; "Puppet is currently broken, feel free to remerge after fixing" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/17194 [16:39:44] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/17194 [16:39:54] New patchset: Mark Bergsma; "Revert "certs: use c_rehash instead of manually symlinking"" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/17193 [16:40:07] PROBLEM - Puppet freshness on argon is CRITICAL: Puppet has not run in the last 10 hours [16:40:07] PROBLEM - Puppet freshness on virt6 is CRITICAL: Puppet has not run in the last 10 hours [16:40:07] PROBLEM - Puppet freshness on virt1000 is CRITICAL: Puppet has not run in the last 10 hours [16:40:20] paravoid: no idea [16:40:21] mark: or is it broken for a different reason? [16:40:23] don't want to figure it out now [16:40:27] what was broken exactly? [16:40:37] i'm just reverting the cert related changes until someone can figure it out [16:40:38] New patchset: Pyoungmeister; "page triage cleanup: use correct syntax for mwscript" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/17195 [16:40:42] err: Could not apply complete catalog: Found 1 dependency cycle: [16:40:42] (File[/etc/ssl/certs/star.wikibooks.org.pem] => Exec[c_rehash] => Class[Certificates::Base] => Install_certificate[star.wikibooks.org] => File[/etc/ssl/certs/star.wikibooks.org.pem]) [16:40:42] Try the '--graph' option and opening the resulting '.dot' file in OmniGraffle or GraphViz [16:40:50] cooool! [16:41:10] PROBLEM - Puppet freshness on virt8 is CRITICAL: Puppet has not run in the last 10 hours [16:41:20] hey Reedy, could you take a look at https://gerrit.wikimedia.org/r/17195 and make sure that my syntax for invoking mwscript is correct? [16:41:20] New review: Mark Bergsma; "Puppet is currently broken, feel free to remerge after fixing" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/17193 [16:41:20] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/17193 [16:41:20] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/17195 [16:41:41] New review: Mark Bergsma; "Puppet is currently broken, feel free to remerge after fixing" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/17193 [16:41:44] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/17193 [16:42:40] RobHalsell: ping [16:42:58] paravoid: ma rk reverted both yours and also the immediate followup ;) [16:43:07] RECOVERY - Puppet freshness on ssl1 is OK: puppet ran at Wed Aug 1 16:42:50 UTC 2012 [16:43:07] PROBLEM - Puppet freshness on ssl3003 is CRITICAL: Puppet has not run in the last 10 hours [16:43:07] PROBLEM - Puppet freshness on virt5 is CRITICAL: Puppet has not run in the last 10 hours [16:43:07] RECOVERY - Puppet freshness on virt8 is OK: puppet ran at Wed Aug 1 16:43:01 UTC 2012 [16:43:25] RECOVERY - Puppet freshness on virt1000 is OK: puppet ran at Wed Aug 1 16:43:09 UTC 2012 [16:43:58] and now puppet freshness is recovering [16:45:04] RECOVERY - Puppet freshness on ssl1004 is OK: puppet ran at Wed Aug 1 16:45:00 UTC 2012 [16:47:10] PROBLEM - Puppet freshness on singer is CRITICAL: Puppet has not run in the last 10 hours [16:47:10] PROBLEM - Puppet freshness on virt0 is CRITICAL: Puppet has not run in the last 10 hours [16:48:28] Reedy: thanks! [16:48:46] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/17195 [16:49:54] aude: heyas [16:50:10] RECOVERY - Puppet freshness on nickel is OK: puppet ran at Wed Aug 1 16:49:55 UTC 2012 [16:50:12] sorry about that, had tunnel vision on something else [16:50:49] RobHalsell: just wondering when we get wikimania videos in (on disk i assume), any suggestions on how we can get them uploaded to a holding bin / dropbox type place? [16:51:05] so we can review and put together the metadata before putting on commons [16:51:08] honestly the easiest thing to do is mail me a hard disk [16:51:13] RobHalsell: exactly [16:51:21] * aude hoping you'd say that ;) [16:51:30] RobHalsell: is giving you a disk easier than getting a place to rsync to? [16:51:35] it will probably be a couple more weeks [16:51:37] we have a sata to usb disk toaster thing already [16:51:46] so it doesnt even need to be an external disk [16:51:52] jeremyb: it would be faster i think unless someone has a great internet connection [16:52:01] RobHalsell: ok [16:52:07] RECOVERY - Puppet freshness on ssl4 is OK: puppet ran at Wed Aug 1 16:51:41 UTC 2012 [16:52:08] we tend to allocate a host for a month or two to transcode videos and the like [16:52:16] RobHalsell: ok :) [16:52:23] Roan handled this past years [16:52:32] i just allocated the hardware for him and plugged in the disks [16:52:52] use labs? [16:52:54] RobHalsell: so we could theoretically have someone just rsync straight to said misc host? [16:53:05] mark: for this amount of storage? [16:53:10] RECOVERY - Puppet freshness on ssl2 is OK: puppet ran at Wed Aug 1 16:52:45 UTC 2012 [16:53:10] RECOVERY - Puppet freshness on ssl3002 is OK: puppet ran at Wed Aug 1 16:52:54 UTC 2012 [16:53:12] what amount of storage? [16:53:24] if labs cant handle that it's pretty useless isn' it [16:53:25] mark: i'm guess more than a TB if it's multiple disks [16:53:27] jeremyb: thats more annoying. [16:53:35] then we have to setup some access and such [16:53:46] use labs [16:53:48] that's what it's for [16:53:59] we could if you think it's reliable enough [16:54:04] RECOVERY - Puppet freshness on sodium is OK: puppet ran at Wed Aug 1 16:54:03 UTC 2012 [16:54:10] mark: checked your schedule? [16:54:28] * aude would just like to get the disk to someone like rob who can get the raw files online somewhere [16:54:51] well, when we do the plug in the disk, its only available to ops and like... roan. [16:54:52] e.g. online but private such as dropbox type thing [16:55:02] add a few s1.xlarge instances and email labs-l to make sure that nobody migrates it ;) [16:55:07] but if you want others to work and access them, then what we have done in the past is not ideal [16:55:18] then you want labs, indeed [16:55:21] RobHalsell: then you can tranfer them to somewhere we can access? [16:55:31] RobHalsell: would it not be possible to just stick it in the labs vlan? i guess too much work [16:55:33] not any differently than you do [16:55:33] it might be me and jeremyb , but not really sure [16:55:34] RECOVERY - Puppet freshness on singer is OK: puppet ran at Wed Aug 1 16:55:18 UTC 2012 [16:55:34] RECOVERY - Puppet freshness on kaulen is OK: puppet ran at Wed Aug 1 16:55:33 UTC 2012 [16:55:37] labs isnt bare metal [16:55:47] * paravoid coughs. [16:55:48] you would have to ask ryan how easy it would be to access osmething like that from them [16:56:10] RECOVERY - Puppet freshness on virt2 is OK: puppet ran at Wed Aug 1 16:55:42 UTC 2012 [16:56:12] ie: i have no idea if it is feasible to expect to plug in an external disk into some box and have a labs instance be able to reach it [16:56:35] copying the files in from a physical server is likely gonna be easier [16:56:37] if its terabytes of video, i can understand wanting to have some kind of local disk transfer [16:56:40] RobHalsell: that's more leslie's dept. and the answer is no ;) [16:56:52] that would work fine [16:56:53] I know some vm hosts can't [16:56:57] jeremyb: how is that leslie? [16:57:04] i meant into the physical virtual host server [16:57:09] and then have the vm moun tit [16:57:09] ah [16:57:12] mark, could you take a look at https://gerrit.wikimedia.org/r/#/c/16990/ ? [16:57:12] seems like ryans area [16:57:15] hrm, interesting [16:57:19] RobHalsell: what can mount from one vlan to another? [16:57:22] MaxSem: yes [16:57:39] jeremyb: not vlan, what can a virtual host access on the actual hardware hosts they reside on [16:57:54] RobHalsell: ohhhhhhhhhh [16:58:07] RECOVERY - Puppet freshness on virt5 is OK: puppet ran at Wed Aug 1 16:57:52 UTC 2012 [16:58:34] RECOVERY - Puppet freshness on fenari is OK: puppet ran at Wed Aug 1 16:58:23 UTC 2012 [16:58:36] heh, i automatically go to 'what solution has the least hardware involved' [16:59:01] RECOVERY - Puppet freshness on marmontel is OK: puppet ran at Wed Aug 1 16:58:39 UTC 2012 [16:59:02] s3 [16:59:04] ;-) [16:59:10] PROBLEM - Puppet freshness on ms-be10 is CRITICAL: Puppet has not run in the last 10 hours [16:59:15] plugging a sata disk thats in our sata to usb toaster thing into the actual server that houses the virtual machine that they would use for transcoding [16:59:44] in my brain i want to say that it wouldn't, because it can't access the local storage on the machine [16:59:48] but i have no idea how well our systems can handle that with openstack and such [16:59:52] i'm going ot start googling [17:00:08] well, you can in esx, esxi, and regular vmware virtual server [17:00:09] LeslieCarr: well you could virsh it... [17:00:11] or you used to be able to [17:00:28] all of those solutions have drawbacks so we dont use them though ;] [17:00:28] and you can in qemu [17:00:39] i don't remember if we use xen or kvm [17:00:39] also can in parallels [17:00:46] i want to say you can in kvm [17:00:49] but its been years. [17:00:52] RobH: the point was kvm == qemu [17:01:07] RECOVERY - Puppet freshness on williams is OK: puppet ran at Wed Aug 1 17:01:00 UTC 2012 [17:01:12] so we should poke ryan when he comes online [17:01:20] yeah, he's not here physically yet [17:01:34] RECOVERY - Puppet freshness on virt0 is OK: puppet ran at Wed Aug 1 17:01:25 UTC 2012 [17:01:34] RECOVERY - Puppet freshness on nfs2 is OK: puppet ran at Wed Aug 1 17:01:30 UTC 2012 [17:01:40] then aude can have the disks shipped to whatever datacenter we are going to locate the instance on [17:01:47] (i imagine tampa for now) [17:02:01] RECOVERY - Puppet freshness on virt1 is OK: puppet ran at Wed Aug 1 17:01:44 UTC 2012 [17:02:05] RobH: that works [17:02:07] New review: Mark Bergsma; "Almost there. :)" [operations/puppet] (production); V: 0 C: -1; - https://gerrit.wikimedia.org/r/16990 [17:02:13] but only if the software supports it, so we'll find out later from ryan [17:02:19] ok [17:02:31] RobH: it might be easier (though not as elegant) to attach it to a machine, i open up a port, and we rsync the files to labs instances [17:02:36] yeah [17:02:37] RECOVERY - Puppet freshness on aluminium is OK: puppet ran at Wed Aug 1 17:02:11 UTC 2012 [17:02:38] far easier [17:02:40] * aude originally envisioned getting disks right after wikimania and delivering them to ashburn [17:02:43] no need to open ports even [17:02:47] labs can just reach public servers [17:02:49] didn't work exactly like that ;) [17:02:57] oh, right of course [17:02:59] then may as well send it to ashburn [17:03:04] cuz i have more hsots there to plug this crap into [17:03:06] heh [17:03:22] RECOVERY - Puppet freshness on gallium is OK: puppet ran at Wed Aug 1 17:03:07 UTC 2012 [17:03:24] whatever you think is best [17:03:25] attach disk, rsync files to labs instance with a lot of storage, unplug disk, be happy [17:03:36] can even be rob's laptop [17:03:40] RECOVERY - Puppet freshness on grosley is OK: puppet ran at Wed Aug 1 17:03:23 UTC 2012 [17:03:46] no need to attach to any servers [17:04:01] my laptop is an air. [17:04:07] usb to usb to usb to usb.... [17:04:07] RECOVERY - Puppet freshness on hooper is OK: puppet ran at Wed Aug 1 17:04:01 UTC 2012 [17:04:42] poor you [17:04:55] just means its non ideal for this.... [17:05:03] so we use a misc host for a bit ;] [17:05:10] RECOVERY - Puppet freshness on sanger is OK: puppet ran at Wed Aug 1 17:04:58 UTC 2012 [17:05:38] RECOVERY - Puppet freshness on formey is OK: puppet ran at Wed Aug 1 17:05:21 UTC 2012 [17:06:04] RECOVERY - Puppet freshness on ssl3 is OK: puppet ran at Wed Aug 1 17:05:45 UTC 2012 [17:06:17] hehe [17:06:40] RECOVERY - Puppet freshness on virt7 is OK: puppet ran at Wed Aug 1 17:06:11 UTC 2012 [17:06:40] RECOVERY - Puppet freshness on virt6 is OK: puppet ran at Wed Aug 1 17:06:18 UTC 2012 [17:07:34] yea... im coming to the realization my next laptop is returning to the thinkbooks. [17:07:34] RECOVERY - Puppet freshness on ssl3003 is OK: puppet ran at Wed Aug 1 17:07:20 UTC 2012 [17:07:34] RECOVERY - Puppet freshness on nfs1 is OK: puppet ran at Wed Aug 1 17:07:31 UTC 2012 [17:07:51] and thus also embracing linux as my desktop =P [17:07:53] i'm pretty sure it's gonna be a retina mbp [17:08:05] i'll just miss the NIC port [17:08:07] i dislike the non swapping battery/ram/parts [17:08:10] RECOVERY - Puppet freshness on virt3 is OK: puppet ran at Wed Aug 1 17:07:39 UTC 2012 [17:08:10] RECOVERY - Puppet freshness on ssl1003 is OK: puppet ran at Wed Aug 1 17:07:49 UTC 2012 [17:08:17] well,t he thunderbolt gige isn't bad... [17:08:19] that just means it'll have to be well specced from the start ;p [17:08:27] i find it easier to accept that when its ultralight [17:08:37] RECOVERY - Puppet freshness on ekrem is OK: puppet ran at Wed Aug 1 17:08:25 UTC 2012 [17:08:43] though i admit, i used brions mac for a few minutes [17:08:48] on the full native res of the new display [17:08:51] i could work on that. [17:09:00] the only thing i've ever swapped in this one is the hd for an ssd [17:10:07] RECOVERY - Puppet freshness on ssl3001 is OK: puppet ran at Wed Aug 1 17:09:59 UTC 2012 [17:11:01] RECOVERY - Puppet freshness on argon is OK: puppet ran at Wed Aug 1 17:10:36 UTC 2012 [17:11:01] RECOVERY - Puppet freshness on ssl1001 is OK: puppet ran at Wed Aug 1 17:10:46 UTC 2012 [17:11:27] LeslieCarr: i'm upgrading the new EX4200s and EX4500 in esams to 11.4R2.14 [17:11:39] to be able to form a VC between them [17:12:04] RECOVERY - Puppet freshness on ssl1002 is OK: puppet ran at Wed Aug 1 17:11:46 UTC 2012 [17:12:39] we'll have to upgrade the existing stack too [17:12:46] cool [17:12:54] yeah [17:13:07] RECOVERY - Puppet freshness on manganese is OK: puppet ran at Wed Aug 1 17:12:34 UTC 2012 [17:13:07] RECOVERY - Puppet freshness on virt4 is OK: puppet ran at Wed Aug 1 17:12:35 UTC 2012 [17:13:09] also, new optics will arrive on friday [17:13:13] so will be able to do the router migration after that [17:13:22] hrm [17:13:24] New patchset: MaxSem; "Commit live hack that enables GeoData on testwiki" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/17197 [17:13:25] just thinking of it [17:13:30] I think i'm one 3m VC cable short :( [17:17:59] oh noes [17:20:10] PROBLEM - Puppet freshness on calcium is CRITICAL: Puppet has not run in the last 10 hours [17:37:34] RECOVERY - Puppet freshness on spence is OK: puppet ran at Wed Aug 1 17:37:13 UTC 2012 [17:37:42] !log pushing dns typo correction [17:37:50] Logged the message, Master [17:39:52] Change merged: preilly; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/17197 [17:44:57] PROBLEM - mysqld processes on db63 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [17:48:04] Change merged: Bhartshorne; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/17146 [18:09:51] RECOVERY - swift-object-server on ms-be1005 is OK: PROCS OK: 25 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server [18:09:51] RECOVERY - swift-container-server on ms-be1005 is OK: PROCS OK: 25 processes with regex args ^/usr/bin/python /usr/bin/swift-container-server [18:10:09] RECOVERY - swift-container-auditor on ms-be1005 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [18:10:09] RECOVERY - swift-account-auditor on ms-be1005 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-auditor [18:10:18] RECOVERY - swift-container-updater on ms-be1005 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-updater [18:10:18] RECOVERY - NTP on ms-be1005 is OK: NTP OK: Offset 0.004585623741 secs [18:10:27] RECOVERY - swift-account-server on ms-be1005 is OK: PROCS OK: 25 processes with regex args ^/usr/bin/python /usr/bin/swift-account-server [18:10:36] RECOVERY - swift-object-auditor on ms-be1005 is OK: PROCS OK: 2 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [18:10:36] RECOVERY - swift-object-updater on ms-be1005 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-updater [18:10:54] RECOVERY - swift-account-reaper on ms-be1005 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-reaper [18:10:54] RECOVERY - swift-container-replicator on ms-be1005 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-replicator [18:10:54] RECOVERY - swift-object-replicator on ms-be1005 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [18:13:19] I think i'm one 3m VC cable short :(~. [18:21:29] New patchset: Alex Monk; "(bug 38926) Create Project/Project Talk namespace aliases on arwikinews." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/17209 [18:29:04] New patchset: MaxSem; "Wiki Loves Monuments API server, RT#3221" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/16990 [18:29:44] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/16990 [18:30:37] New patchset: MaxSem; "Wiki Loves Monuments API server, RT#3221" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/16990 [18:30:42] RECOVERY - NTP on ms-be1009 is OK: NTP OK: Offset 0.008412241936 secs [18:30:51] RECOVERY - swift-container-server on ms-be1009 is OK: PROCS OK: 25 processes with regex args ^/usr/bin/python /usr/bin/swift-container-server [18:31:00] RECOVERY - swift-object-replicator on ms-be1009 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [18:31:09] RECOVERY - swift-container-auditor on ms-be1009 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [18:31:17] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/16990 [18:31:18] RECOVERY - swift-object-server on ms-be1009 is OK: PROCS OK: 25 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server [18:31:18] RECOVERY - swift-account-server on ms-be1009 is OK: PROCS OK: 25 processes with regex args ^/usr/bin/python /usr/bin/swift-account-server [18:31:52] RECOVERY - swift-container-updater on ms-be1009 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-updater [18:31:52] RECOVERY - swift-object-updater on ms-be1009 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-updater [18:32:10] RECOVERY - swift-account-auditor on ms-be1009 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-auditor [18:32:10] RECOVERY - swift-object-auditor on ms-be1009 is OK: PROCS OK: 2 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [18:32:37] RECOVERY - swift-account-reaper on ms-be1009 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-reaper [18:32:37] RECOVERY - swift-container-replicator on ms-be1009 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-replicator [18:34:43] PROBLEM - swift-account-reaper on ms-be1007 is CRITICAL: Connection refused by host [18:34:43] PROBLEM - swift-object-replicator on ms-be1007 is CRITICAL: Connection refused by host [18:34:43] PROBLEM - swift-container-replicator on ms-be1007 is CRITICAL: Connection refused by host [18:35:01] PROBLEM - swift-account-replicator on ms-be1007 is CRITICAL: Connection refused by host [18:35:01] PROBLEM - swift-object-server on ms-be1007 is CRITICAL: Connection refused by host [18:35:01] PROBLEM - swift-container-server on ms-be1007 is CRITICAL: Connection refused by host [18:35:03] New patchset: Aaron Schulz; "Added thumb_handler.php entry point to git." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/17212 [18:35:19] PROBLEM - swift-account-server on ms-be1007 is CRITICAL: Connection refused by host [18:35:19] PROBLEM - swift-container-updater on ms-be1007 is CRITICAL: Connection refused by host [18:35:28] PROBLEM - swift-object-updater on ms-be1007 is CRITICAL: Connection refused by host [18:35:46] PROBLEM - swift-container-auditor on ms-be1007 is CRITICAL: Connection refused by host [18:35:46] PROBLEM - swift-object-auditor on ms-be1007 is CRITICAL: Connection refused by host [18:35:46] PROBLEM - swift-account-auditor on ms-be1007 is CRITICAL: Connection refused by host [18:36:50] MaxSem: /usr/local/sbin, not /usr/bin [18:37:04] mark, not root anymore [18:37:24] still can't be in /usr/bin [18:37:35] programs not installed by packages should put stuff under /usr/local [18:38:02] oh, and forgot to remove root [18:38:07] yeah [18:38:17] you need to put in something else though [18:38:21] which user will it run as? [18:38:25] wlm [18:39:39] and while you're at it [18:39:43] quote your resource titles [18:39:56] i think this one would work, but puppet often barfs on weird characters [18:40:05] your cron resource I mean [18:40:21] mark: and systemuser? [18:40:29] yeah [18:42:37] New patchset: MaxSem; "Wiki Loves Monuments API server, RT#3221" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/16990 [18:43:16] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/16990 [18:43:21] MaxSem: if it's NOT root, it needs to be in /usr/local/bin [18:43:41] grrr [18:43:53] Change merged: Aaron Schulz; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/17212 [18:44:01] * MaxSem needs to read FHS [18:44:02] also [18:44:11] I would like the update stuff to live in a separate subclass [18:44:14] would be much cleaner [18:44:18] I think i commented that earlier too [18:44:32] or at least not in one big long file resource list [18:44:35] it's a bit unclear now [18:44:45] if you keep it in one class, at least separate it out visually a bit, with comments [18:45:14] mark, I replied asking for guideline on this stuff [18:45:31] yeah but we don't really have anything written out yet [18:46:13] it's mostly a matter of good coding practices though [18:46:21] in php you're not putting everything in one function either right [18:47:06] hmm, have you seen our parser? XD [18:48:03] good argument right there [18:54:23] j^: about webm mime.type, the only thing I know is that it is in /etc/mime.type on Ubuntu Precise [18:54:55] j^: and I have no idea which system / software actually serves the files behind the nginx proxy :/ [18:55:41] I did MIME type stuff for Apache before [18:55:47] But ms7 runs some other web server [18:55:58] is upload.wm.org ultimately served by Apache ? [18:56:06] No [18:56:09] But bits is [18:56:22] I had to do MIME poking for WebFonts, but that was bits, which is Apache [18:56:24] upload is something else [18:56:26] cause that would be just about adding the AddType video/webm .webm [18:56:30] And it's not puppetized *stab* [18:56:51] ben explained me the thumb system [18:56:56] I eventually completely forgot about it [18:57:13] I got lost in the too many layers involved [18:57:45] paravoid: are you around and could you help me with reprepro? [18:58:08] I know how the system works but I don't know much about ms7 and ms5 themselves [18:58:18] Like which one runs which OS and which web server is used [18:58:30] ms5 runs linux; ms7 solaris. [18:58:54] OK [18:58:58] And which web servers are used? [18:59:13] j^: I will later, I thought mark was going through some review ping-pong with you [18:59:24] Ah, ms5 runs nginx apparently [18:59:25] maplebed: about to leave but shoot if it's something quick [18:59:29] This is actually puppetized [18:59:36] maplebed: I was talking about Jan mail to ops and adding a mime type to .webm files served by upload.wm.org [18:59:46] paravoid: a package I need is only in the lucid-wikimedia repo; I need it on precise. [19:00:04] the package is python and I don't think has anything version-specific. [19:00:11] reprepro copy :) [19:00:19] I found https://wikitech.wikimedia.org/view/Reprepro#Copying_between_distributions but it's not working. [19:00:27] "Will not copy as not found: ganglia-logtailer." [19:00:27] one would assume that ms5 would run iis ;-) [19:00:51] what did you type exactly? [19:01:00] reprepro --ignore=undefinedtarget copy lucid-wikimedia precise-wikimedia ganglia-logtailer [19:01:06] oh damn. [19:01:11] thanks, carboard debugger. [19:01:16] I got the arguments backwards. [19:01:19] it's destination source [19:01:20] right [19:01:37] and I don't think you need the --ignore [19:01:53] without it I get [19:01:54] Error: packages database contains unused 'karmic-wikimedia|main|amd64' database. [19:01:54] This either means you removed a distribution, component or architecture from [19:01:54] the distributions config file without calling clearvanished, or your config [19:01:54] does not belong to this database. [19:02:03] To ignore use --ignore=undefinedtarget. [19:02:04] [19:02:10] ah [19:02:14] let me fix that [19:02:36] paravoid: thanks; reversing the arguments worked and it looks happy. [19:03:05] that's because mark (rightfully) removed karmic, the database needs cleanup [19:03:23] !log installing a couple lib upgrades on fenari [19:03:31] Logged the message, Master [19:03:42] ok, ran clearvanished [19:04:08] !log Ran reprepro --delete clearvanished on brewster to cleanup removed repositories karmic-wikimedia and oneiric-wikimedia [19:04:16] Logged the message, Master [19:04:41] !log started hotbackup of db1017 to db63 [19:04:49] Logged the message, Master [19:06:01] New patchset: MaxSem; "Wiki Loves Monuments API server, RT#3221" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/16990 [19:06:40] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/16990 [19:07:07] RECOVERY - swift-account-replicator on ms-be1009 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-replicator [19:07:34] RECOVERY - swift-container-replicator on ms-be1007 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-replicator [19:07:43] RECOVERY - swift-account-reaper on ms-be1007 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-reaper [19:07:43] RECOVERY - swift-object-auditor on ms-be1007 is OK: PROCS OK: 2 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [19:07:43] RECOVERY - swift-container-server on ms-be1007 is OK: PROCS OK: 25 processes with regex args ^/usr/bin/python /usr/bin/swift-container-server [19:07:43] RECOVERY - swift-account-replicator on ms-be1007 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-replicator [19:07:43] RECOVERY - swift-object-server on ms-be1007 is OK: PROCS OK: 25 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server [19:07:44] RECOVERY - swift-object-replicator on ms-be1007 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [19:08:01] RECOVERY - swift-account-replicator on ms-be1005 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-replicator [19:08:01] RECOVERY - swift-container-updater on ms-be1007 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-updater [19:08:01] RECOVERY - swift-object-updater on ms-be1007 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-updater [19:08:10] RECOVERY - swift-account-auditor on ms-be1007 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-auditor [19:08:19] RECOVERY - swift-account-server on ms-be1007 is OK: PROCS OK: 25 processes with regex args ^/usr/bin/python /usr/bin/swift-account-server [19:08:28] RECOVERY - swift-container-auditor on ms-be1007 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [19:26:17] PROBLEM - swift-account-replicator on ms-be1003 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-replicator [19:35:12] RobH: are you in eqiad today? [19:35:41] maplebed: nope, was yesterday [19:35:46] can be, whats up? [19:35:46] k. [19:36:06] ms-be1005 isn't behaving. I should poke at the bios more before giving you a ticket though. [19:36:21] from parted: Error: Error opening /dev/sdc: No such device or address [19:38:51] anyway. lunch now. [19:42:23] mark, I think I've done everything as you said [19:45:37] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/16990 [19:46:22] merged [19:48:50] woo-hoo, thanks! [19:50:59] can someone run puppet on yttrium to make that change apply? ^^ [19:52:29] that doesn't let me in [19:53:41] i'm going off now [19:54:34] MaxSem: lemme check [19:57:07] MaxSem: Ok, its never been run before [19:57:13] so i am going ahead and getting it done now [19:57:39] move wiki test, move wiki test! [19:57:57] this is two seconds of work [19:58:01] moving test wiki sounds like more [19:58:24] i have too many other things that only i can work on right now ;] (im being nice to do this one!!) [19:59:57] ok, puppet is updating, lessee what it does [20:05:31] New patchset: Jeremyb; "bug 38610 - ukwiktionary logo -> commons" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/17263 [20:06:58] MaxSem: yea.... its a bunch of catalog runs behind, this will take a bit [20:07:06] RobH, thanks [20:09:47] package dependency issues [20:09:54] MaxSem: you guys have this setup working in labs? [20:10:01] with all of the stuff thats now included? [20:10:13] yes, on mobile-wlm [20:10:37] hrmm [20:10:43] i wonder whats causing this then [20:11:40] RobH, could this be because new labs instances aren't completely "naked"? [20:11:50] seems to be having an issue with libapache2-mod-php5 [20:12:10] well, new labs instance is idential to a new server instance except for its networking [20:12:15] was my understanding [20:12:28] so other than its not bare metal and its in a specific vlan (natted) is it [20:14:42] issue is where thats called, and a package conflict [20:15:51] * MaxSem tried removing that package and running puppet - it got reinstalled properly [20:16:58] heh, will do [20:17:09] just was reading to see exactly what it had issue with on old install of it [20:17:15] but i suppose if it doesnt do it again, who cares =] [20:17:22] I mean, that's how I verified this bit on Labs [20:18:25] damn [20:18:31] still throws issue [20:18:40] what is the error? [20:20:00] http://pastebin.com/y5DZKhZZ [20:20:03] output of run [20:22:54] New patchset: Jeremyb; "uawikimedia logo: update to match local override" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/17330 [20:23:26] huh - was puppet able to install any other packages, or did it fail like that for all of them? [20:23:47] that was the only fail [20:23:53] but when it hits that, it halts [20:24:35] it looks like it failed on php5-mysql and libapache2-mod-php5 [20:24:54] hrm [20:25:13] is at apt-get update getting run before all of that? [20:26:16] it should, but i can trigger a manual one and try [20:28:09] no dice [20:28:11] still fails [20:28:16] :( [20:31:17] hrmm, im poking around to see if i can figure it out [20:31:31] but i also need to break and eat something, i realized a few minutes ago i have not eaten today [20:31:36] and its 430pm already =P [20:31:47] don't try to kill yourself! [20:31:57] go eat [20:32:21] surprised you haven't gotten a little twingy [20:32:22] MaxSem: systemuser's title is still not quoted [20:32:31] this is me trying to guilt some other op to foolishly volunteer to figure this out ;] [20:34:24] cmjohnson1: https://bugs.launchpad.net/ubuntu/+source/grub-installer/+bug/976027 [20:34:34] cmjohnson1: http://askubuntu.com/questions/143678/i-receive-the-error-grub-install-dev-sda-failed-while-attempting-to-install-u [20:35:55] jeremyb, there are lots of manifests written like that. one of them I used as an example. puppet parser validate doesn't seem to mind [20:36:39] MaxSem: i'm just saying if you're fixing some, why not fix them all. not that it's necessarily worth another gerrit change [20:37:10] I mean, yes it could use more polishing but we need that thing up and running by last week [20:38:57] jeremyb, you're overestimating my gerrit-fu :) I just take pieces of other manifests as examples and fix what I'm told to:) [20:41:37] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/17263 [20:41:42] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/17330 [20:42:22] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/17158 [20:43:33] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/17209 [20:44:12] cmjohnson1: i'm going to double check the partition, just in case.... [20:44:27] ok [20:45:06] it is using the db cfg [20:46:04] hehe it won't let me change the partitioning around with this [20:46:18] i'm guessing it's picking up the partman file every time it starts to go to the menu [20:50:27] PROBLEM - Host ms-be1009 is DOWN: PING CRITICAL - Packet loss = 100% [20:50:27] PROBLEM - Host ms-be1007 is DOWN: PING CRITICAL - Packet loss = 100% [20:50:27] PROBLEM - Host ms-be1011 is DOWN: PING CRITICAL - Packet loss = 100% [20:50:27] PROBLEM - Host ms-be1010 is DOWN: PING CRITICAL - Packet loss = 100% [20:50:27] PROBLEM - Host ms-be1003 is DOWN: PING CRITICAL - Packet loss = 100% [20:50:28] PROBLEM - Host ms-be1006 is DOWN: PING CRITICAL - Packet loss = 100% [20:50:28] PROBLEM - Host ms-be1001 is DOWN: PING CRITICAL - Packet loss = 100% [20:50:29] PROBLEM - Host ms-be1002 is DOWN: PING CRITICAL - Packet loss = 100% [20:50:36] PROBLEM - Host ms-fe1001 is DOWN: PING CRITICAL - Packet loss = 100% [20:51:30] RECOVERY - swift-object-updater on ms-be1002 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-updater [20:51:30] RECOVERY - swift-container-updater on ms-be1002 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-updater [20:51:30] RECOVERY - Host ms-be1006 is UP: PING OK - Packet loss = 0%, RTA = 35.47 ms [20:51:30] RECOVERY - Host ms-be1009 is UP: PING OK - Packet loss = 0%, RTA = 35.68 ms [20:51:30] RECOVERY - Host ms-be1011 is UP: PING OK - Packet loss = 0%, RTA = 35.42 ms [20:51:39] RECOVERY - Host ms-be1001 is UP: PING OK - Packet loss = 0%, RTA = 35.42 ms [20:51:39] RECOVERY - Host ms-be1002 is UP: PING OK - Packet loss = 0%, RTA = 35.39 ms [20:51:39] RECOVERY - Host ms-be1007 is UP: PING OK - Packet loss = 0%, RTA = 35.54 ms [20:51:57] RECOVERY - Host ms-be1010 is UP: PING OK - Packet loss = 0%, RTA = 35.43 ms [20:51:58] Are we going to have like 5 new extensions somewhat deployed this week then? :/ [20:51:59] ^^^ those are all in eqiad and it's ok. [20:52:03] rargh, wrong channel [20:52:15] PROBLEM - Host ms-fe1002 is DOWN: PING CRITICAL - Packet loss = 100% [20:52:24] RECOVERY - swift-object-updater on ms-be1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-updater [20:52:29] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/16875 [20:52:33] RECOVERY - swift-container-updater on ms-be1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-updater [20:53:09] RECOVERY - Memcached on ms-fe1001 is OK: TCP OK - 9.034 second response time on port 11211 [20:53:18] RECOVERY - Host ms-fe1001 is UP: PING OK - Packet loss = 0%, RTA = 35.58 ms [20:53:36] RECOVERY - Host ms-fe1002 is UP: PING OK - Packet loss = 0%, RTA = 35.40 ms [20:53:54] PROBLEM - Host ms-fe1003 is DOWN: PING CRITICAL - Packet loss = 100% [20:53:58] New review: Jeremyb; "of course this was actually ukwikiquote not ukwiktionary. patch was fine, only commit msg was wrong" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/17263 [20:54:21] PROBLEM - Host ms-fe1004 is DOWN: PING CRITICAL - Packet loss = 100% [20:54:48] RECOVERY - Memcached on ms-fe1003 is OK: TCP OK - 0.035 second response time on port 11211 [20:54:57] RECOVERY - Host ms-fe1003 is UP: PING OK - Packet loss = 0%, RTA = 35.51 ms [20:55:06] RECOVERY - Memcached on ms-fe1004 is OK: TCP OK - 0.035 second response time on port 11211 [20:55:15] RECOVERY - Host ms-fe1004 is UP: PING OK - Packet loss = 0%, RTA = 35.39 ms [20:55:42] RECOVERY - Host ms-be1003 is UP: PING OK - Packet loss = 0%, RTA = 35.44 ms [20:55:58] New patchset: Bhartshorne; "removing swiftcleaner's check against ms5 since it's now out of the lop." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/17334 [20:56:08] MaxSem, RobH i can't replicate the problem in a labs instance, puppet seems to run just fine and properly install the packages [20:56:31] my food delivery just got here, taking a break to eat =] [20:56:39] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/17334 [20:56:40] no worries [20:56:48] Change merged: Bhartshorne; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/17334 [20:56:51] enjoy your meal RobH [20:58:42] PROBLEM - SSH on ms-be1003 is CRITICAL: Connection refused [20:58:51] PROBLEM - swift-container-updater on ms-be1003 is CRITICAL: Connection refused by host [20:59:00] PROBLEM - swift-object-updater on ms-be1003 is CRITICAL: Connection refused by host [20:59:00] PROBLEM - swift-account-auditor on ms-be1003 is CRITICAL: Connection refused by host [20:59:00] PROBLEM - swift-object-auditor on ms-be1003 is CRITICAL: Connection refused by host [20:59:18] PROBLEM - swift-account-server on ms-be1003 is CRITICAL: Connection refused by host [20:59:18] PROBLEM - swift-container-auditor on ms-be1003 is CRITICAL: Connection refused by host [20:59:18] PROBLEM - swift-container-replicator on ms-be1003 is CRITICAL: Connection refused by host [20:59:36] PROBLEM - swift-object-replicator on ms-be1003 is CRITICAL: Connection refused by host [20:59:54] PROBLEM - swift-object-server on ms-be1003 is CRITICAL: Connection refused by host [20:59:54] PROBLEM - swift-container-server on ms-be1003 is CRITICAL: Connection refused by host [20:59:54] PROBLEM - swift-account-reaper on ms-be1003 is CRITICAL: Connection refused by host [21:06:55] binasher: so, there's turning up the mc ports, however we need to figure out a good way to ensure that the secondary interfaces have the right ip's and default routes [21:07:31] binasher: some way that's automated, don't want to have to do everything by hand by default [21:07:51] but, still has puppet know which machine is which ip, and nagios freaks out if 1 machine has two ip's... [21:09:20] what nagios check freaks out? [21:10:22] New patchset: Aaron Schulz; "Configured $wgTimedTextForeignNamespaces for commons." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/17337 [21:10:32] Change merged: Aaron Schulz; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/17337 [21:10:42] nagios itself, the hosts file [21:10:52] RobH - do you know which 10g cards we have ? [21:11:00] is it in a PO perhaps ? [21:11:09] supposedly a lot have pxe support - http://www.dell.com/ed/business/p/cna-network-interconnects/product-compare [21:13:07] lesliecarr: rt 2882 [21:13:30] thanks chris :) [21:15:11] ok, supposedy has pxe support [21:15:35] now to see if there's any docs on how to make it pxe boot... [21:18:37] grrr dell's website sorta sucks [21:18:57] RobH/cmjohnson1 is there a phone line with tech support that's not for your windows desktop ? [21:19:12] yep... [21:19:38] 800-456-3355 [21:19:42] dell enterprise support 1800-945-3355 [21:19:47] then they ask the service tag of the system [21:19:52] or that to ^ [21:19:52] which you can pull from racktables [21:20:07] cmjohnson1: huh, i didnt have that one [21:20:08] noted [21:20:45] cool [21:21:07] cmjohnson1: so does a person pickup on that one? on the one i gave you have to say hardware, then support [21:21:09] i wonder what 3355 spells [21:21:10] blah blah [21:21:11] ;) [21:21:41] not directly...but you only have to say or press 2 to get to a person [21:21:48] I really hate ringing dell, which I might have to do tomorrow :( Damn CMC software issues [21:21:53] i'm going to try calling them [21:22:00] thank god the whiskey is only 10 feet away [21:23:10] is the SN the same as the service tag ? [21:23:18] yep [21:23:23] cool [21:25:25] lesliecarr: santa? really!?! [21:25:41] cmjohnson1: :p i couldn't think of anything [21:26:29] i wasn't in the military [21:26:55] hehe looked it up [21:28:53] New patchset: Pyoungmeister; "mediawiki and application server modules." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/17342 [21:29:34] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/17342 [21:57:13] PROBLEM - Puppet freshness on analytics1006 is CRITICAL: Puppet has not run in the last 10 hours [21:59:18] PROBLEM - Puppet freshness on cp1032 is CRITICAL: Puppet has not run in the last 10 hours [22:01:11] New patchset: Aaron Schulz; "Switched testwikis to multiwrite backend." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/17348 [22:02:47] Change merged: Aaron Schulz; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/17348 [22:05:13] RobH did you have a chance to poke that weird puppet stuff yet? [22:30:09] New patchset: Aaron Schulz; "Made mw.org use the multiwrite backend." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/17353 [22:30:32] Change merged: Aaron Schulz; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/17353 [22:33:19] heh [22:33:28] might be good to ask someone else [22:33:41] I'm focusing completely on development for the next week or so [22:39:20] PROBLEM - Puppet freshness on srv281 is CRITICAL: Puppet has not run in the last 10 hours [22:42:09] j^: if you're struggling to find an ops person to help, if you ask woosters nicely he may be able to find you someone [22:43:05] I'm here but I have no idea what needs to be done [22:43:10] and I have no access to ms7 afaik [22:43:22] ms7 is about to be killed and does not run puppet [22:43:36] so I never complained for not having access [22:43:43] eep, not have keys either i bet [22:43:48] (hating solaris might have something to do with that too) [22:43:59] oh i have access [22:44:01] do we know who does have access? [22:44:03] ewww 2005 [22:44:08] ok i have no idea what to do [22:44:14] it's sunos 5.10 [22:44:18] Ariel would possibly be the best bet [22:44:25] uptime 919 days, that's impressive [22:44:28] yes, apergos would definitely know [22:44:28] But is seemingly MIA... [22:44:37] we need to add a mime type [22:44:38] omfg and someone has been logged in via the console for years [22:44:54] tell me that's a figure of speech :) [22:44:56] can we confrim what web server is running there? [22:44:59] no it's not [22:45:10] gah [22:45:15] i don't know how to ps aux [22:45:18] I'm sure it's got known uptime related bugs [22:45:19] i hate you solaris [22:45:25] Reedy: it's solaris [22:45:28] LeslieCarr: ps -ef [22:45:32] thanks [22:45:35] so, I'm in ms5 [22:45:46] looking. [22:45:49] webservd [22:45:53] hahahahaha [22:46:02] Server: Sun-Java-System-Web-Server/7.0 [22:46:06] maplebed: what's the timeline with ms5/ms7? [22:46:12] next two weeks. [22:46:24] for both of them? [22:46:40] Party time [22:46:43] ms5 isn't relevant for the mime type thing though, only ms7. [22:46:54] (it's not a thumbnail getting served) [22:47:07] ah, and I was about to fix that [22:47:08] dammit :P [22:47:09] is there a obj.conf file? [22:47:11] it's runnign under the apache user ? [22:47:41] using find..... [22:48:25] no obj.conf in /etc [22:49:02] #--Sun Microsystems Inc. MIME Information [22:49:02] # Do not delete the above line. It is used to identify the file type [22:49:04] lol. [22:49:34] you can test that it "worked" with: [22:49:34] curl -I "http://upload.wikimedia.org/wikipedia/test2/7/7c/1_b0q4jyja.webm" [22:49:38] I got in [22:49:48] root@ms7 # cat /etc/mime.types [22:49:48] should have content type video/webm instead of Content-Type: text/plain [22:49:48] #--Netscape Communications Corporation MIME Information [22:49:48] #Do not delete the above line. It is used to identify the file type. [22:49:49] New patchset: Pyoungmeister; "moving db role classes to role/db.pp." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/17355 [22:49:51] #mime types added by Netscape Helper [22:49:51] that's about it [22:49:53] type=application/x-java-jnlp-file desc="Java Web Start" exts="jnlp" [22:49:55] hrm [22:49:57] I doubt that's in use :) [22:49:58] yeah [22:50:01] same thing here [22:50:21] ah, found it [22:50:22] paravoid: http://docs.oracle.com/cd/E19528-01/819-2630/gaieg/index.html ? [22:50:27] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/17355 [22:50:36] it's in /opt/webserver7/https-ms7/config [22:50:47] you wouldn't find https-ms7 in any docs :) [22:51:03] oh yay [22:51:03] i wonder why https not http [22:51:30] oh come on, https-ms7 is so the international standard [22:51:44] jeremyb: it's the http service. [22:52:05] (abbreviated, of course, https) [22:52:20] seriously?!!! [22:52:22] so [22:52:25] Awesomes [22:52:32] let's see how to reload that now [22:52:35] haha [22:52:46] reboot? [22:52:47] you might reload it… or you might take ms7 down and it will never work [22:52:48] ;) [22:52:48] good luck [22:52:49] haha [22:52:56] New patchset: Asher; "run query analysis if a cluster is defined, vs. if in tampa" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/17356 [22:53:00] nah, it's probably going to be svcadm [22:53:00] now i'm giggling damnit [22:53:21] online 2010 svc:/network/http:https-ms7 [22:53:34] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/17356 [22:53:55] for you following at home [22:54:00] 2010 is when this state was last changed [22:54:07] i.e. when the webserver was last reloaded [22:55:43] does it have any fail over? [22:56:02] nope [22:56:11] as we said before, it's to be replaced within the two weeks [22:57:54] maplebed: ready to switch it over tonight? ;) [22:58:02] nope. [22:58:11] content will be lost if we're forced to switch now. [22:58:43] paravoid: LeslieCarr: at least you can say you did some solaris admin stuff for the WMF now ;) [22:59:19] to be clear though, to move content to swift requires nfs, not http. [22:59:33] hehe [22:59:49] maplebed: are the kitten pictures safe ? [22:59:52] that's the important part [23:00:12] heh... I've got at least 5 backed up on my laptop. [23:00:44] You need over 9000. srsly. [23:02:10] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/17356 [23:02:22] PROBLEM - swift-account-replicator on ms-be1008 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-replicator [23:02:24] yes [23:02:41] I mean, we can troll cuteoverload but their pics may not be cc-by-sa [23:03:49] LeslieCarr: here ya go: http://omgcatsinspace.tumblr.com/ [23:04:02] omg !!!! [23:04:12] ok, i'm done with work for the day [23:04:18] i'm going to have a cat in space seizure [23:04:47] :) [23:04:52] j^: how important is it to happen now? [23:05:04] j^: I'd prefer to stick on the safe side and wait for apergos in about 7 hours [23:05:28] if it's imperative to happen now I can risk it and run a restart [23:06:03] I just feel a bit uneasy restarting a SPOF webserver that hasn't been restarted for 900 days [23:06:10] oh I didn't say wait for the swift migration [23:06:26] yes [23:06:28] that makes sense [23:06:39] If it's a SPOF webserver it can't be /that/ important :D [23:06:40] Ariel/apergos, a member of our time is really this box's expert [23:06:42] wait for swift .. no sense in touching something that has not been touched for close to 3 years [23:06:55] Damianz: um, have you seen our infrastructure? ;) [23:07:05] Oh, deploy and run? [23:07:08] he'll probably know what to do tomorrow [23:07:15] still with the solaris box [23:07:51] j^: sorry for this, this is at least the second time I'm blocking you :) [23:08:06] New patchset: Pyoungmeister; "addin an es_eqiad nagios group" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/17358 [23:08:07] cat is safe: http://tstarling.com/stuff/wall_cat/ [23:08:45] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/17358 [23:10:43] how did the cat get into the wall ? [23:10:46] it looks pretty fat [23:11:31] PROBLEM - Apache HTTP on srv234 is CRITICAL: HTTP CRITICAL: HTTP/1.0 500 Internal Server Error [23:11:31] poor cat [23:11:49] PROBLEM - Apache HTTP on srv236 is CRITICAL: HTTP CRITICAL: HTTP/1.0 500 Internal Server Error [23:11:57] j^: I replied to that mail. Ariel usually comes early in our morning, which is in about 7 hours [23:12:20] the house is split level, with two stories on one side and one in the middle on the other [23:12:29] that's the floor level of the upper storey [23:12:54] she got in under the floor via the attic, where there are gaps in the floor [23:14:03] New review: Pyoungmeister; "asher gave thumbs up." [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/17355 [23:14:03] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/17355 [23:14:25] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/17358 [23:15:27] TimStarling, unbelievable :O [23:16:49] New patchset: Asher; "include misc servers in ishmael" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/17359 [23:17:27] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/17359 [23:23:13] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [23:26:23] cmjohnson1 / RobH https://rt.wikimedia.org/Ticket/Display.html?id=3362 [23:27:16] !log rebuilding wikitech-l archives [23:27:24] Logged the message, Master [23:39:54] New patchset: Catrope; "(bug 38903) Enable SubPageList3 on cswiktionary" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/17362 [23:40:30] we <3 RoanKattouw [23:41:21] I thought Danny_B|backup might be involved there ;) [23:41:45] obviously since i'm reporter öf it ;-) [23:42:04] I was meaning cs in the dbname :p [23:42:45] i see [23:42:52] Reedy: can you cherrypick that? [23:43:03] I don't need to [23:43:06] just review and push it [23:43:24] whatever needed to make it live ;-) [23:43:28] RECOVERY - mysqld processes on db63 is OK: PROCS OK: 1 process with command name mysqld [23:46:51] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/17362 [23:47:04] PROBLEM - MySQL Replication Heartbeat on db63 is CRITICAL: CRIT replication delay 7290 seconds [23:47:22] PROBLEM - MySQL Slave Delay on db63 is CRITICAL: CRIT replication delay 7274 seconds [23:47:50] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [23:55:01] PROBLEM - Misc_Db_Lag on db10 is CRITICAL: CHECK MySQL REPLICATION - lag - CRITICAL - Seconds_Behind_Master : 605s [23:57:47] hrmmmmmm [23:57:50] replag [23:58:03] @replag [23:58:05] Nemo_bis: No replag currently. See also "replag all". [23:58:10] It's not cluster replag [23:58:11] Nemo_bis: it's off cluster [23:58:12] ah right [23:58:21] well, "cluster" as in "wiki cluster" [23:58:25] mpf so quick at correcting me [23:58:42] Though, I thought most things had been moved from db9/db10 already? [23:58:42] dbbot-wm: you could bemore zealous anyway, don't stare me in that way [23:58:49] just pls ping me when it's live, thank you guys [23:59:07] oops [23:59:10] I forgot to press enter [23:59:26] done [23:59:31] RECOVERY - Apache HTTP on srv234 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.047 second response time [23:59:31] PROBLEM - Misc_Db_Lag on db10 is CRITICAL: CHECK MySQL REPLICATION - lag - CRITICAL - Seconds_Behind_Master : 608s [23:59:56] hah [23:59:57] Reedy: :-* [23:59:58] RECOVERY - Apache HTTP on srv236 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.035 second response time