[01:03:39] (03PS1) 10Reedy: Fix undefined $errstr usage [operations/software] - 10https://gerrit.wikimedia.org/r/125682 [01:04:42] lovely variable name [01:04:59] (03PS3) 10Reedy: Remove prettify, seems to be unused [operations/software] - 10https://gerrit.wikimedia.org/r/118952 [01:47:47] PROBLEM - MySQL Idle Transactions on db1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [01:48:07] PROBLEM - MySQL InnoDB on db1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [01:48:38] RECOVERY - MySQL Idle Transactions on db1047 is OK: OK longest blocking idle transaction sleeps for 0 seconds [01:48:57] RECOVERY - MySQL InnoDB on db1047 is OK: OK longest blocking idle transaction sleeps for 0 seconds [02:12:38] PROBLEM - Disk space on virt0 is CRITICAL: DISK CRITICAL - free space: /a 2889 MB (3% inode=99%): [02:12:52] (03PS2) 10Chad: Use proper php.ini comment format [operations/puppet] - 10https://gerrit.wikimedia.org/r/121695 [02:13:00] (03Abandoned) 10Chad: Use proper php.ini comment format [operations/puppet] - 10https://gerrit.wikimedia.org/r/121695 (owner: 10Chad) [02:18:37] PROBLEM - Disk space on virt0 is CRITICAL: DISK CRITICAL - free space: /a 3811 MB (3% inode=99%): [02:20:27] PROBLEM - Disk space on lvs3001 is CRITICAL: DISK CRITICAL - free space: / 1777 MB (3% inode=97%): [02:23:01] !log LocalisationUpdate completed (1.23wmf21) at 2014-04-14 02:22:58+00:00 [02:23:07] Logged the message, Master [02:42:07] !log LocalisationUpdate completed (1.23wmf22) at 2014-04-14 02:42:05+00:00 [02:42:12] Logged the message, Master [02:47:17] PROBLEM - Puppet freshness on lvs3001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [02:47:17] PROBLEM - Puppet freshness on lvs3002 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [02:47:17] PROBLEM - Puppet freshness on lvs3003 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [02:47:17] PROBLEM - Puppet freshness on lvs3004 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [03:00:27] PROBLEM - Slow CirrusSearch query rate on fluorine is CRITICAL: CirrusSearch-slow.log_line_rate CRITICAL: 0.000277777777778 [03:00:37] RECOVERY - Disk space on virt0 is OK: DISK OK [03:04:53] (03CR) 10Ori.livneh: [C: 032] Fix undefined $errstr usage [operations/software] - 10https://gerrit.wikimedia.org/r/125682 (owner: 10Reedy) [03:23:03] !log LocalisationUpdate ResourceLoader cache refresh completed at Mon Apr 14 03:22:58 UTC 2014 (duration 22m 57s) [03:23:07] Logged the message, Master [03:48:07] PROBLEM - MySQL InnoDB on db1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:49:47] PROBLEM - MySQL Idle Transactions on db1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [03:49:57] RECOVERY - MySQL InnoDB on db1047 is OK: OK longest blocking idle transaction sleeps for 0 seconds [03:50:38] RECOVERY - MySQL Idle Transactions on db1047 is OK: OK longest blocking idle transaction sleeps for 0 seconds [04:00:27] RECOVERY - Slow CirrusSearch query rate on fluorine is OK: CirrusSearch-slow.log_line_rate OKAY: 0.0 [04:25:04] ugh [04:25:18] Gloria: I ugh'd. [04:25:24] It's the beginning of the end. [04:28:01] Ugh. [04:36:05] grrrit-wm is slow [04:36:10] (03PS1) 10Ori.livneh: Graphite/Carbon: set max open files via Upstart 'limit' directive [operations/puppet] - 10https://gerrit.wikimedia.org/r/125686 [04:36:13] (03PS1) 10Ori.livneh: Graphite/Carbon: less logspam [operations/puppet] - 10https://gerrit.wikimedia.org/r/125687 [04:37:00] (03CR) 10Ori.livneh: [C: 032 V: 032] Graphite/Carbon: set max open files via Upstart 'limit' directive [operations/puppet] - 10https://gerrit.wikimedia.org/r/125686 (owner: 10Ori.livneh) [04:37:38] (03CR) 10Ori.livneh: [C: 032] Graphite/Carbon: less logspam [operations/puppet] - 10https://gerrit.wikimedia.org/r/125687 (owner: 10Ori.livneh) [05:02:24] hi _joe_ [05:03:15] <_joe_> hi matanya [05:03:28] <_joe_> just reconnected as TOR seems to be dead [05:03:45] <_joe_> well, TOR on freenode [05:33:38] PROBLEM - Graphite Carbon on tungsten is CRITICAL: CRITICAL: Not all configured Carbon instances are running. [05:34:38] RECOVERY - Graphite Carbon on tungsten is OK: OK: All defined Carbon jobs are runnning. [05:48:17] PROBLEM - Puppet freshness on lvs3001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [05:48:17] PROBLEM - Puppet freshness on lvs3002 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [05:48:17] PROBLEM - Puppet freshness on lvs3003 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [05:48:17] PROBLEM - Puppet freshness on lvs3004 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [05:55:26] <_joe_> off to school - BB in ~ 1 hour [06:00:27] PROBLEM - Slow CirrusSearch query rate on fluorine is CRITICAL: CirrusSearch-slow.log_line_rate CRITICAL: 0.000277700638711 [07:00:27] RECOVERY - Slow CirrusSearch query rate on fluorine is OK: CirrusSearch-slow.log_line_rate OKAY: 0.0 [07:08:28] (03CR) 10Dzahn: "the ticket says there is a remaining question what to do with /root and is not 100% clear to me if it can be shutdown" [operations/puppet] - 10https://gerrit.wikimedia.org/r/123626 (owner: 10Matanya) [07:15:15] (03CR) 10Alexandros Kosiaris: [C: 032] Create and import shapelines from a pre-existing dump [operations/puppet] - 10https://gerrit.wikimedia.org/r/123767 (owner: 10Alexandros Kosiaris) [07:42:49] (03CR) 10Dzahn: [C: 04-1] "yes, i was going to merge this because it is indeed what upstream changed and the point is to reduce the diff between default and custom f" (034 comments) [wikimedia/bugzilla/modifications] - 10https://gerrit.wikimedia.org/r/119726 (owner: 1001tonythomas) [07:45:47] (03PS3) 10Dzahn: decom : brewster [operations/puppet] - 10https://gerrit.wikimedia.org/r/123626 (owner: 10Matanya) [07:50:54] (03CR) 10Dzahn: [C: 032] "the only remaining question was what to do with old package files in /root. please speak up on Alex' mail thread "apt.wikimedia.org migrat" [operations/puppet] - 10https://gerrit.wikimedia.org/r/123626 (owner: 10Matanya) [07:54:16] !log brewster - disabling puppet agent, removed from site.pp, revoke puppet cert [07:54:22] Logged the message, Master [07:59:04] forgot to do this earlier [07:59:38] waves, hi apergos [08:06:45] apergos: fyi, /root/atg on brewster has "utfnormal" packaging stuff, and "mwbzutils", if you think that should be saved / is not on the new apt.wm, now would be a good time [08:07:06] because of brewster decom and the files in /root there [08:07:07] toss please [08:07:14] great, thx [08:07:22] yw [08:30:00] (03PS1) 10Dzahn: decom: remove brewster incl. mgmt [operations/dns] - 10https://gerrit.wikimedia.org/r/125695 [08:31:40] search looks to be broken on wikitech [08:32:48] <_joe_> apergos: yes I can confirm this. [08:33:03] <_joe_> I'm getting a white page [08:33:07] I"ll look at it in a little bit then [08:33:15] <_joe_> which usually is a php error [08:33:16] (encountered this in doing something else) [08:33:30] yep I expect there is an exception of some sort [08:33:40] confirmed [08:33:52] maybe a "virt0" somewhere hardcoded left [08:33:59] where it is now virt1000 ? [08:36:56] not in error.log hrmm [08:39:13] config/Local.php:$wgCirrusSearchServers = array( '10.2.2.30' ); [08:40:44] <_joe_> eh. [08:41:04] it can talk to that, it's lvs1003 [08:41:23] from virt1000 [08:41:27] which is now wikitech [08:41:53] but i did not look for any recent changes in that config yet [08:44:12] (03Abandoned) 10Alexandros Kosiaris: Remove DNS entries for brewster [operations/dns] - 10https://gerrit.wikimedia.org/r/123206 (owner: 10Alexandros Kosiaris) [08:44:23] (03CR) 10Alexandros Kosiaris: [C: 032] decom: remove brewster incl. mgmt [operations/dns] - 10https://gerrit.wikimedia.org/r/125695 (owner: 10Dzahn) [08:48:38] PROBLEM - Puppet freshness on lvs3001 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [08:48:38] PROBLEM - Puppet freshness on lvs3003 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [08:48:38] PROBLEM - Puppet freshness on lvs3002 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [08:48:38] PROBLEM - Puppet freshness on lvs3004 is CRITICAL: Last successful Puppet run was Thu 01 Jan 1970 12:00:00 AM UTC [08:50:15] maybe broken by something similar to " Merge "beta: drop pmtpa reference for CirrusSearch"" [09:01:17] !log brewster - stop lighttpd,bacula-fd,haproxy,dhcp3-server,rsync,nrpe,salt [09:01:22] Logged the message, Master [09:05:03] ACKNOWLEDGEMENT - check configured eth on virt1001 is CRITICAL: virbr0 reporting no carrier. daniel_zahn see RT #7251 [09:05:03] ACKNOWLEDGEMENT - check configured eth on virt1002 is CRITICAL: virbr0 reporting no carrier. daniel_zahn see RT #7251 [09:05:03] ACKNOWLEDGEMENT - check configured eth on virt1003 is CRITICAL: virbr0 reporting no carrier. daniel_zahn see RT #7251 [09:05:03] ACKNOWLEDGEMENT - check configured eth on virt1004 is CRITICAL: virbr0 reporting no carrier. daniel_zahn see RT #7251 [09:05:03] ACKNOWLEDGEMENT - check configured eth on virt1005 is CRITICAL: virbr0 reporting no carrier. daniel_zahn see RT #7251 [09:05:03] ACKNOWLEDGEMENT - check configured eth on virt1006 is CRITICAL: virbr0 reporting no carrier. daniel_zahn see RT #7251 [09:05:03] ACKNOWLEDGEMENT - check configured eth on virt1007 is CRITICAL: virbr0 reporting no carrier. daniel_zahn see RT #7251 [09:10:34] CUSTOM - check_job_queue on terbium is CRITICAL: JOBQUEUE CRITICAL - the following wikis have more than 199,999 jobs: , enwiki (336047), Total (455012) [09:10:47] should the limit be raised again? ^ [09:11:04] 199k limit vs. 336k actual [09:11:58] PROBLEM - SSH on lvs1002 is CRITICAL: Server answer: [09:14:58] RECOVERY - SSH on lvs1002 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.3 (protocol 2.0) [09:24:31] https://bugzilla.wikimedia.org/show_bug.cgi?id=63839 meh [09:26:05] !log deleting huge pybal log on lvs3001 [09:26:11] Logged the message, Master [09:26:28] RECOVERY - Disk space on lvs3001 is OK: DISK OK [09:32:57] probably 3002-3004 are the same? [09:36:58] after restarting apache, logs to the error log went there instead of to a deleted error.log.1; I updated the bug reoprt with the php fatal [09:37:06] for search on wikitech [09:37:52] the disk space issue was only on that 1 host [09:38:01] puppet disabled on all though [09:38:24] apergos: ah [09:50:00] ACKNOWLEDGEMENT - Disk space on virt1000 is CRITICAL: DISK CRITICAL - free space: / 2304 MB (3% inode=85%): daniel_zahn due to backups from migration. need to remind Andrew though to move to new backup location [09:54:51] apergos: Can you check extensions were recursively checked out for wikitech please? [10:05:28] !log had to toss extensions/Elastica on virt1000 and run git submodule update --init --recursive seems to be working now [10:05:34] Logged the message, Master [10:05:39] thanks ree dy [10:11:38] PROBLEM - Puppet freshness on labsdb1004 is CRITICAL: Last successful Puppet run was Mon 14 Apr 2014 07:10:31 AM UTC [10:14:32] "Slow CirrusSearch query rate" [10:23:12] (03CR) 10Dzahn: [C: 04-1] "./check_apt_mirror: 14: ./check_apt_mirror: let: not found" (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/112738 (owner: 10Matanya) [10:30:48] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000 [10:33:55] (03PS2) 10Dzahn: apt mirror: add monitoring script to verify mirror is up to date [operations/puppet] - 10https://gerrit.wikimedia.org/r/112738 (owner: 10Matanya) [10:35:44] (03CR) 10Dzahn: [C: 031] "- "-lt 2 -o -eq 2" is the same as just "-le 2"" [operations/puppet] - 10https://gerrit.wikimedia.org/r/112738 (owner: 10Matanya) [10:37:50] wth was that [10:38:50] <_joe_> paravoid: a spike in 5xx responses for a minute, it seems [10:39:47] seems like mobile esams [10:40:55] http://ganglia.wikimedia.org/latest/graph_all_periods.php?h=cp3012.esams.wikimedia.org&m=cpu_report&r=hour&s=by%20name&hc=4&mc=2&st=1397471984&g=mem_report&z=large&c=Mobile%20caches%20esams [10:40:59] *sigh* [10:41:07] still the same shit [10:41:17] compact_memory not having the right effect, still [10:49:07] are you saying domas was wrong? ;p [10:49:24] !log reedy synchronized wmf-config/interwiki.cdb 'Updating interwiki cache' [10:49:30] Logged the message, Master [10:49:44] (03PS1) 10Reedy: Update interwiki cache [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/125703 [10:50:13] (03CR) 10Reedy: [C: 032] Update interwiki cache [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/125703 (owner: 10Reedy) [10:50:20] (03Merged) 10jenkins-bot: Update interwiki cache [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/125703 (owner: 10Reedy) [10:50:38] (03CR) 10Dzahn: [C: 032] "it does the job like this. of course needs to be added to NRPE check" [operations/puppet] - 10https://gerrit.wikimedia.org/r/112738 (owner: 10Matanya) [10:53:23] (03CR) 10TTO: "While you're in an interwiki mood, Reedy, could you please take a look at Iec1ee0ab80a16ae6f5d89528c060331881f72c4a and maybe even Ie327c7" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/125703 (owner: 10Reedy) [10:54:17] (03CR) 10Dzahn: "and please take this as well when syncing mw config anyways, changes a comment line only" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/123853 (owner: 10Dzahn) [10:58:08] (03PS1) 10Faidon Liambotis: ssl: fix whitespace-damaged files [operations/puppet] - 10https://gerrit.wikimedia.org/r/125704 [10:58:31] (03PS2) 10Faidon Liambotis: ssl: fix whitespace-damaged certificates [operations/puppet] - 10https://gerrit.wikimedia.org/r/125704 [10:59:13] (03CR) 10Faidon Liambotis: [C: 032 V: 032] ssl: fix whitespace-damaged certificates [operations/puppet] - 10https://gerrit.wikimedia.org/r/125704 (owner: 10Faidon Liambotis) [11:06:50] (03PS1) 10Dzahn: remove ms6 ipmi, decom [operations/dns] - 10https://gerrit.wikimedia.org/r/125705 [11:08:41] !log brewster - shut down [11:08:45] Logged the message, Master [11:10:16] (03CR) 10ArielGlenn: [C: 032] remove ms6 ipmi, decom [operations/dns] - 10https://gerrit.wikimedia.org/r/125705 (owner: 10Dzahn) [11:12:56] (03CR) 10Dzahn: "apergos, it was also this" [operations/dns] - 10https://gerrit.wikimedia.org/r/124901 (owner: 10Matanya) [11:13:35] (03CR) 10Dzahn: "also removing mgmt in Change-Id: Idf410b3e02a3" [operations/dns] - 10https://gerrit.wikimedia.org/r/124901 (owner: 10Matanya) [11:15:23] (03CR) 10Dzahn: [C: 032] "changes to the script itself in a separate change on purpose" [operations/puppet] - 10https://gerrit.wikimedia.org/r/123852 (owner: 10Dzahn) [11:18:14] (03CR) 10Dzahn: "matanya, let's make a new patch that handles the non-db67 db's in here" [operations/puppet] - 10https://gerrit.wikimedia.org/r/122406 (owner: 10Dzahn) [11:18:49] (03CR) 10Dzahn: [C: 031] decom: remove brewster incl. mgmt [operations/dns] - 10https://gerrit.wikimedia.org/r/125695 (owner: 10Dzahn) [11:20:28] (03PS4) 10Dzahn: include 'bastionhost' on bastion hosts [operations/puppet] - 10https://gerrit.wikimedia.org/r/122399 [11:20:36] (03CR) 10Dzahn: [C: 031] include 'bastionhost' on bastion hosts [operations/puppet] - 10https://gerrit.wikimedia.org/r/122399 (owner: 10Dzahn) [11:25:28] (03PS3) 10Dzahn: lint role/deployment [operations/puppet] - 10https://gerrit.wikimedia.org/r/122338 [11:25:34] (03CR) 10Dzahn: lint role/deployment (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/122338 (owner: 10Dzahn) [11:28:48] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: reqstats.5xx [warn=250.000 [11:29:47] (03PS2) 10Dzahn: lint labsproxy.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/122335 [11:29:58] (03CR) 10Dzahn: lint labsproxy.pp (034 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/122335 (owner: 10Dzahn) [11:31:14] (03PS2) 10Reedy: replace hume with terbium in a comment [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/123853 (owner: 10Dzahn) [11:31:20] (03CR) 10Reedy: [C: 032] replace hume with terbium in a comment [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/123853 (owner: 10Dzahn) [11:31:28] (03Merged) 10jenkins-bot: replace hume with terbium in a comment [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/123853 (owner: 10Dzahn) [11:36:17] apergos: you responded "yes we should" for dataset1001's network upgrade; RT ticket? :) [11:36:32] yes. I did. [11:37:35] you did file one or you did respond? [11:37:52] (03CR) 10Giuseppe Lavagetto: [C: 031] "All the lint changes are harmless, including the change of type in the deployer_groups definition" [operations/puppet] - 10https://gerrit.wikimedia.org/r/122338 (owner: 10Dzahn) [11:39:48] I responded only, it should not be done just yet unless it can be done without interruption of the network connection [11:39:55] in about a day it can happen [11:40:36] RT ticket so we won't forget? [11:43:57] (03CR) 10Dzahn: lint role/keystone (labs) (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/122334 (owner: 10Dzahn) [11:44:18] (03PS3) 10Andrew Bogott: Remove deprecated roles, and lint remaining labsproxy.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/122335 (owner: 10Dzahn) [11:44:27] I'm on rt so tomorrow I'll be looking at it again (and perhaps doing it) [11:44:58] at least that was my plan, if you want me to pass it to you that's ok too [11:45:08] (03CR) 10Andrew Bogott: "We're not using the pmtpa/eqiad-proxy bits anymore, so I greatly simplified this by ripping out old code." [operations/puppet] - 10https://gerrit.wikimedia.org/r/122335 (owner: 10Dzahn) [11:47:00] no, that's fine [11:47:03] (03PS2) 10Dzahn: lint role/keystone (labs) [operations/puppet] - 10https://gerrit.wikimedia.org/r/122334 [11:47:16] (03CR) 10ArielGlenn: [C: 031] "I think this is fine, just a note that the line" [operations/puppet] - 10https://gerrit.wikimedia.org/r/122338 (owner: 10Dzahn) [11:48:04] (03CR) 10Dzahn: [C: 031] "nice :)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/122335 (owner: 10Dzahn) [11:52:02] (03CR) 10Dzahn: "eh, actually it's in here twice, once with true and once with false ?!" [operations/puppet] - 10https://gerrit.wikimedia.org/r/122338 (owner: 10Dzahn) [11:53:03] (03PS4) 10Dzahn: lint role/deployment [operations/puppet] - 10https://gerrit.wikimedia.org/r/122338 [11:53:12] (03CR) 10Dzahn: [C: 032] "the disabled one is just labs" [operations/puppet] - 10https://gerrit.wikimedia.org/r/122338 (owner: 10Dzahn) [12:00:28] PROBLEM - Slow CirrusSearch query rate on fluorine is CRITICAL: CirrusSearch-slow.log_line_rate CRITICAL: 0.00138888888889 [12:01:45] (03CR) 10Dzahn: "no changes on tin after puppet run" [operations/puppet] - 10https://gerrit.wikimedia.org/r/122338 (owner: 10Dzahn) [12:03:15] (03CR) 10Dzahn: [C: 032] Remove deprecated roles, and lint remaining labsproxy.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/122335 (owner: 10Dzahn) [12:06:25] (03PS3) 10Dzahn: lint role/keystone (labs) [operations/puppet] - 10https://gerrit.wikimedia.org/r/122334 [12:09:15] (03PS2) 10Alexandros Kosiaris: Import coastlines and land polygons into OSM db [operations/puppet] - 10https://gerrit.wikimedia.org/r/123792 [12:10:28] (03CR) 10Andrew Bogott: [C: 032] lint role/keystone (labs) [operations/puppet] - 10https://gerrit.wikimedia.org/r/122334 (owner: 10Dzahn) [12:12:23] :) [12:16:25] (03CR) 10Dzahn: [C: 031] "re: importing coastlines. does this get slower while processing Norway?:) Slartibartfast may have copyright http://en.wikipedia.org/wiki/" [operations/puppet] - 10https://gerrit.wikimedia.org/r/123792 (owner: 10Alexandros Kosiaris) [12:18:34] mutante: ahahahaha. Only if the shapefile also contains a stavromula beta [12:18:51] *gg* [12:23:02] (03CR) 10Dzahn: "we should make a new bz test instance in eqiad, can do soon" [wikimedia/bugzilla/modifications] - 10https://gerrit.wikimedia.org/r/124140 (owner: 1001tonythomas) [12:27:11] (03CR) 10Dzahn: "https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?search_string=svc" [operations/dns] - 10https://gerrit.wikimedia.org/r/120063 (owner: 10Dzahn) [12:30:47] (03CR) 10Alexandros Kosiaris: [C: 032] include 'bastionhost' on bastion hosts [operations/puppet] - 10https://gerrit.wikimedia.org/r/122399 (owner: 10Dzahn) [12:33:10] woot, merge day for me, thanks [12:33:48] I wish that would ever be the case for me... [12:34:13] https://gerrit.wikimedia.org/r/116019 still rotting around... [12:34:33] hoo: I can press abandon for you [12:34:40] Oh wait, no I can't ;) [12:35:13] That looks pretty trivial if you just poke someone from Ops... [12:35:17] :P Less rights, less abuse [12:35:37] I poked more or less everyone yet... mh [12:35:46] oh apergos is on rt duty, so I can annoy him this time [12:35:49] :P [12:36:26] (03PS8) 10Hoo man: Introduce an admins::release user group [operations/puppet] - 10https://gerrit.wikimedia.org/r/116019 [12:36:32] mh, it even cleanly rebased [12:38:44] (03CR) 10Dzahn: [C: 031] "to reply to questions above: the problem this wants to solve is that admins::restricted gives more access than actually needed, it is not " [operations/puppet] - 10https://gerrit.wikimedia.org/r/116019 (owner: 10Hoo man) [12:39:36] don't abandon it, it's good that it points out that restricted does not just mean bastion [12:40:09] even uploaded a PS [12:40:14] I wish to do more such changes, but if review is this slow, this will take ages [12:40:42] hoo: we just talked about entire refactoring of admins.pp with half the ops team in Athens [12:40:46] something will happen [12:40:59] hoo: You should add some more reviewers [12:41:02] * Reedy grins [12:41:05] it is probably just that everybody wanted to wait for that before introducing that [12:41:30] and point out again why you added it in the first place (gives extra access etc) [12:42:00] I explained it to various people on IRC :P [12:42:14] should have probably noted that in the bug [12:43:46] (03CR) 10ArielGlenn: "I would prefer that users in this group are added to all bastions, not just bast1001 which happens to be the bation for the primary dc rig" [operations/puppet] - 10https://gerrit.wikimedia.org/r/116019 (owner: 10Hoo man) [12:43:55] gah typos. anyways [12:44:04] (03CR) 10Dzahn: "admins::restricted gives terbium access. terbium access is not needed for release uploaders." [operations/puppet] - 10https://gerrit.wikimedia.org/r/116019 (owner: 10Hoo man) [12:44:58] apergos: They only need the bastion for the release server... they don't need to ssh into pmtpa or even esams, right? [12:46:21] paravoid: Do you want https://gerrit.wikimedia.org/r/#/c/121912/ deploying? [12:46:22] there should be one class that gives an account on all (non-ops) bastions but nothing else [12:46:31] but also more than just one [12:46:43] Reedy: ? [12:46:51] I want it merged & deployed, yes :) [12:46:51] atom recent changes [12:46:53] right [12:47:06] mutante: ahhh, that's what I wanted to do first but then someone told me that it's a stupid idea ... *sigh* [12:47:07] code is sane, just going to fix the code style and add a RELEASE-NOTES entry [12:47:14] awesome [12:47:17] thank you very much [12:47:21] mutante: yep that's what I was getting at, a bastions only class [12:47:36] then if service A moves to $otherdc we don't have to shuffle access around [12:48:27] agrees, it should add one bastion in each dc, but also not add more than bastions [12:48:43] and the rest is about naming [12:49:01] not changing the existing one called "restricted" is less work [12:49:07] mutante: Ok, but fenari isn't good as a bastion as hit has a f*cking lot of private data on it [12:49:17] s/hit/it/ [12:49:31] yea, fenari, i'd say exclude it already [12:49:47] hopefully it won't be alive that much longer [12:49:48] If we wait long enough mutante will murder fenari [12:50:39] heh [12:51:16] So you people now want the bastions only group which you didn't want first? [12:51:24] apergos: i agree to both :p fenari is not a good bastion but we have no other pmtpa bastion [12:51:55] just prepare the class as bastions and if it has just one member for now, it's ok? [12:52:48] So basically: Back to PS1 and rebase? [12:52:53] https://gerrit.wikimedia.org/r/#/c/116019/1/manifests/admins.pp [12:53:08] (ignore the white space changes) [12:54:18] if you ask me,ehm..yes [12:55:09] (03CR) 10Alexandros Kosiaris: [C: 032] Import coastlines and land polygons into OSM db [operations/puppet] - 10https://gerrit.wikimedia.org/r/123792 (owner: 10Alexandros Kosiaris) [12:56:09] apergos: i think he's right, don't even encourage them to use fenari anymore, while calling it bastion class to be flexible later [12:56:38] RECOVERY - Puppet freshness on labsdb1004 is OK: puppet ran at Mon Apr 14 12:56:31 UTC 2014 [12:59:55] (03PS1) 10Dzahn: remove pmtpa payments LVS monitoring [operations/puppet] - 10https://gerrit.wikimedia.org/r/125715 [13:00:28] RECOVERY - Slow CirrusSearch query rate on fluorine is OK: CirrusSearch-slow.log_line_rate OKAY: 0.0 [13:01:03] Ok, will put that thing back to the PS1 state later today [13:02:16] +1 [13:05:28] PROBLEM - DPKG on mw1149 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:05:37] (03PS1) 10Alexandros Kosiaris: Correct land_polygons file names [operations/puppet] - 10https://gerrit.wikimedia.org/r/125717 [13:05:38] PROBLEM - DPKG on mw1153 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:05:48] PROBLEM - DPKG on mw1150 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:06:28] RECOVERY - DPKG on mw1149 is OK: All packages OK [13:06:39] PROBLEM - DPKG on mw1152 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:06:48] RECOVERY - DPKG on mw1150 is OK: All packages OK [13:07:18] PROBLEM - Apache HTTP on mw1149 is CRITICAL: Connection refused [13:07:38] RECOVERY - DPKG on mw1152 is OK: All packages OK [13:07:38] RECOVERY - DPKG on mw1153 is OK: All packages OK [13:07:38] PROBLEM - Apache HTTP on mw1150 is CRITICAL: Connection refused [13:07:44] that would be me [13:07:48] PROBLEM - Apache HTTP on mw1153 is CRITICAL: Connection refused [13:08:08] PROBLEM - Apache HTTP on mw1152 is CRITICAL: Connection refused [13:08:18] RECOVERY - Apache HTTP on mw1149 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.090 second response time [13:08:38] RECOVERY - Apache HTTP on mw1150 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.086 second response time [13:08:48] RECOVERY - Apache HTTP on mw1153 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.108 second response time [13:08:58] PROBLEM - DPKG on mw1151 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:09:08] RECOVERY - Apache HTTP on mw1152 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.092 second response time [13:09:58] RECOVERY - DPKG on mw1151 is OK: All packages OK [13:11:49] (03CR) 10Alexandros Kosiaris: [C: 032] Correct land_polygons file names [operations/puppet] - 10https://gerrit.wikimedia.org/r/125717 (owner: 10Alexandros Kosiaris) [13:13:08] PROBLEM - DPKG on mw1154 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:13:38] PROBLEM - DPKG on mw1155 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:13:58] PROBLEM - DPKG on mw1156 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:14:38] RECOVERY - DPKG on mw1155 is OK: All packages OK [13:14:38] PROBLEM - Apache HTTP on mw1156 is CRITICAL: Connection refused [13:14:38] PROBLEM - Apache HTTP on mw1155 is CRITICAL: Connection refused [13:14:58] RECOVERY - DPKG on mw1156 is OK: All packages OK [13:14:58] PROBLEM - Apache HTTP on mw1154 is CRITICAL: Connection refused [13:15:07] !log staggered upgrades for all pending updates on all mw* boxes & restarting apaches/other core services [13:15:08] RECOVERY - DPKG on mw1154 is OK: All packages OK [13:15:13] Logged the message, Master [13:15:58] PROBLEM - DPKG on terbium is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:15:58] RECOVERY - Apache HTTP on mw1154 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.180 second response time [13:16:38] RECOVERY - Apache HTTP on mw1156 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.177 second response time [13:16:39] RECOVERY - Apache HTTP on mw1155 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.177 second response time [13:16:39] PROBLEM - DPKG on tin is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:17:39] RECOVERY - DPKG on tin is OK: All packages OK [13:17:58] RECOVERY - DPKG on terbium is OK: All packages OK [13:20:18] PROBLEM - DPKG on mw1159 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:20:18] PROBLEM - DPKG on mw1160 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:20:38] PROBLEM - DPKG on mw1158 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:20:38] PROBLEM - DPKG on mw1157 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:21:38] PROBLEM - DPKG on mw1004 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:21:39] RECOVERY - DPKG on mw1158 is OK: All packages OK [13:21:39] PROBLEM - DPKG on mw1002 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:21:39] PROBLEM - Apache HTTP on mw1159 is CRITICAL: Connection refused [13:21:39] RECOVERY - DPKG on mw1157 is OK: All packages OK [13:21:48] PROBLEM - Apache HTTP on mw1157 is CRITICAL: Connection refused [13:21:58] PROBLEM - DPKG on mw1003 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:22:09] !log reedy synchronized php-1.23wmf21/includes/api/ApiFeedRecentChanges.php 'I268d0a53067738ba96bee74c593358b0b28cc083' [13:22:13] Logged the message, Master [13:22:14] <_joe_> should I worry? [13:22:18] RECOVERY - DPKG on mw1160 is OK: All packages OK [13:22:18] RECOVERY - DPKG on mw1159 is OK: All packages OK [13:22:28] no [13:22:28] PROBLEM - DPKG on mw1001 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:22:31] it's just me [13:22:36] upgrading the universe :) [13:22:38] RECOVERY - DPKG on mw1004 is OK: All packages OK [13:22:38] RECOVERY - DPKG on mw1002 is OK: All packages OK [13:22:38] RECOVERY - Apache HTTP on mw1159 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.104 second response time [13:22:38] PROBLEM - DPKG on mw1006 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:22:44] (I !logged it) [13:22:48] RECOVERY - Apache HTTP on mw1157 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.101 second response time [13:22:54] !log reedy synchronized php-1.23wmf22/includes/api/ApiFeedRecentChanges.php 'I268d0a53067738ba96bee74c593358b0b28cc083' [13:22:58] RECOVERY - DPKG on mw1003 is OK: All packages OK [13:22:58] PROBLEM - DPKG on mw1005 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:22:58] Logged the message, Master [13:22:59] paravoid: done and done [13:23:02] <_joe_> paravoid: sorry lost it [13:23:26] Reedy: thanks a bunch [13:23:29] RECOVERY - DPKG on mw1001 is OK: All packages OK [13:23:38] RECOVERY - DPKG on mw1006 is OK: All packages OK [13:23:47] _joe_: can't blame you ;) [13:23:56] oh boy how I wish this was more automated [13:23:58] RECOVERY - DPKG on mw1005 is OK: All packages OK [13:24:28] PROBLEM - DPKG on tmh1001 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:25:58] PROBLEM - DPKG on mw1011 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:26:18] PROBLEM - DPKG on mw1008 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:26:18] PROBLEM - DPKG on mw1012 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:26:28] RECOVERY - DPKG on tmh1001 is OK: All packages OK [13:26:38] PROBLEM - DPKG on mw1007 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:26:38] PROBLEM - DPKG on mw1009 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:26:48] PROBLEM - DPKG on mw1010 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:26:58] RECOVERY - DPKG on mw1011 is OK: All packages OK [13:27:18] RECOVERY - DPKG on mw1008 is OK: All packages OK [13:27:18] RECOVERY - DPKG on mw1012 is OK: All packages OK [13:27:38] RECOVERY - DPKG on mw1007 is OK: All packages OK [13:27:38] RECOVERY - DPKG on mw1009 is OK: All packages OK [13:27:48] RECOVERY - DPKG on mw1010 is OK: All packages OK [13:29:48] PROBLEM - DPKG on tmh1002 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:29:58] PROBLEM - DPKG on mw1014 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:30:18] PROBLEM - DPKG on mw1013 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:30:28] PROBLEM - DPKG on mw1016 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:30:28] PROBLEM - DPKG on mw1015 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:30:51] (03PS2) 10Manybubbles: Make Elasticsearch more reliable in beta [operations/puppet] - 10https://gerrit.wikimedia.org/r/125331 [13:30:58] RECOVERY - DPKG on mw1014 is OK: All packages OK [13:31:09] ottomata: morning [13:31:18] RECOVERY - DPKG on mw1013 is OK: All packages OK [13:31:28] RECOVERY - DPKG on mw1016 is OK: All packages OK [13:31:28] RECOVERY - DPKG on mw1015 is OK: All packages OK [13:31:48] RECOVERY - DPKG on tmh1002 is OK: All packages OK [13:34:51] (03PS1) 10Dzahn: move LDAP admin permissions,tools out of site.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/125721 [13:35:52] (03CR) 10jenkins-bot: [V: 04-1] move LDAP admin permissions,tools out of site.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/125721 (owner: 10Dzahn) [13:35:58] (03PS2) 10Dzahn: move LDAP admin permissions,tools out of site.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/125721 [13:37:01] (03CR) 10jenkins-bot: [V: 04-1] move LDAP admin permissions,tools out of site.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/125721 (owner: 10Dzahn) [13:38:14] (03PS3) 10Dzahn: move LDAP admin permissions,tools out of site.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/125721 [13:39:16] (03CR) 10jenkins-bot: [V: 04-1] move LDAP admin permissions,tools out of site.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/125721 (owner: 10Dzahn) [13:40:07] duh [13:41:58] PROBLEM - DPKG on mw1017 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:42:15] manybubbles: morning! [13:42:29] (03PS4) 10Dzahn: move LDAP admin permissions,tools out of site.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/125721 [13:42:58] RECOVERY - DPKG on mw1017 is OK: All packages OK [13:43:25] ahh CirrusSearch query rate [13:43:31] gonna work on that as soon as I get through my emails [13:43:44] i made a tweak to it friday hoping it would be ok, but apparently not [13:43:51] akosiaris, do you still object to https://gerrit.wikimedia.org/r/#/c/125183/ ? [13:44:10] ottomata: I don't see it complaining here. [13:45:47] (03Abandoned) 10Andrew Bogott: Added instancetype fact to get labs instance flavor. [operations/puppet] - 10https://gerrit.wikimedia.org/r/117823 (owner: 10Andrew Bogott) [13:45:48] ah ha, oh [13:45:53] i was just looking at scroll back [13:46:01] hah, must have been looking at scrollback from friday [13:46:23] all I see this morning is dkpg [13:46:40] hey, if you end up doing another repartition today let me know [13:46:48] PROBLEM - DPKG on mw1018 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:46:51] (03PS3) 10Andrew Bogott: openstack: qualify var [operations/puppet] - 10https://gerrit.wikimedia.org/r/119488 (owner: 10Matanya) [13:47:20] I'd like to sneak in a jvm config change on the restart to see if that speeds things up a tad. I'll do it manually to whichever one you are restarting and it if works puppet it. [13:47:25] if it doesn't I'll revert it [13:47:59] manybubbles: yeah plan on more reparitioning [13:48:00] ok cool [13:48:48] RECOVERY - DPKG on mw1018 is OK: All packages OK [13:49:51] matanya: just to reconfirm my understanding of how this works… if $openstack_version is set in site.pp, that's the same scope as $::openstack_version in other manifests? [13:49:53] (I'm looking at https://gerrit.wikimedia.org/r/#/c/119488/ ) [13:52:08] PROBLEM - DPKG on mw1034 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:52:08] PROBLEM - DPKG on mw1049 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:52:13] andrewbogott: not really. My only question is if this interface is needed at all. [13:52:18] PROBLEM - DPKG on mw1027 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:52:18] PROBLEM - DPKG on mw1020 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:52:18] PROBLEM - DPKG on mw1035 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:52:18] PROBLEM - DPKG on mw1032 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:52:18] PROBLEM - DPKG on mw1045 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:52:26] (still me) [13:52:28] PROBLEM - DPKG on mw1023 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:52:28] PROBLEM - DPKG on mw1021 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:52:28] PROBLEM - DPKG on mw1041 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:52:28] PROBLEM - DPKG on mw1028 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:52:34] andrewbogott: no. $openstack_version in site.pp != $::openstack_version [13:52:38] PROBLEM - DPKG on mw1031 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:52:38] PROBLEM - DPKG on mw1043 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:52:38] PROBLEM - DPKG on mw1048 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:52:38] PROBLEM - DPKG on mw1030 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:52:38] PROBLEM - DPKG on mw1026 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:52:39] PROBLEM - DPKG on mw1038 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:52:48] PROBLEM - DPKG on mw1037 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:52:48] PROBLEM - DPKG on mw1047 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:52:48] PROBLEM - DPKG on mw1039 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:52:52] akosiaris: it is not needed at all. It is installed by libvirt, which /is/ needed. [13:52:56] paravoid: upgrading ? [13:52:58] PROBLEM - DPKG on mw1029 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:52:58] PROBLEM - DPKG on mw1022 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:52:58] PROBLEM - DPKG on mw1019 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:52:58] PROBLEM - DPKG on mw1040 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:52:58] PROBLEM - DPKG on mw1036 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:52:58] PROBLEM - DPKG on mw1025 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:52:58] PROBLEM - DPKG on mw1044 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:52:59] PROBLEM - DPKG on mw1046 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:52:59] PROBLEM - DPKG on mw1033 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:53:00] PROBLEM - DPKG on mw1024 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:53:00] PROBLEM - DPKG on mw1042 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:53:16] Seems weird to remove it after the fact just for cleanup, since presumaly the libvirt .deb will expect it to be there in the future. [13:53:31] akosiaris: yes [13:54:48] RECOVERY - DPKG on mw1037 is OK: All packages OK [13:54:48] RECOVERY - DPKG on mw1047 is OK: All packages OK [13:54:48] RECOVERY - DPKG on mw1039 is OK: All packages OK [13:54:58] RECOVERY - DPKG on mw1040 is OK: All packages OK [13:54:58] RECOVERY - DPKG on mw1019 is OK: All packages OK [13:54:58] RECOVERY - DPKG on mw1029 is OK: All packages OK [13:54:58] RECOVERY - DPKG on mw1022 is OK: All packages OK [13:54:58] RECOVERY - DPKG on mw1044 is OK: All packages OK [13:54:59] RECOVERY - DPKG on mw1025 is OK: All packages OK [13:54:59] RECOVERY - DPKG on mw1036 is OK: All packages OK [13:55:00] RECOVERY - DPKG on mw1046 is OK: All packages OK [13:55:00] RECOVERY - DPKG on mw1033 is OK: All packages OK [13:55:01] RECOVERY - DPKG on mw1024 is OK: All packages OK [13:55:01] RECOVERY - DPKG on mw1042 is OK: All packages OK [13:55:08] RECOVERY - DPKG on mw1034 is OK: All packages OK [13:55:08] RECOVERY - DPKG on mw1049 is OK: All packages OK [13:55:18] RECOVERY - DPKG on mw1027 is OK: All packages OK [13:55:18] RECOVERY - DPKG on mw1020 is OK: All packages OK [13:55:18] RECOVERY - DPKG on mw1035 is OK: All packages OK [13:55:18] RECOVERY - DPKG on mw1032 is OK: All packages OK [13:55:18] RECOVERY - DPKG on mw1045 is OK: All packages OK [13:55:28] RECOVERY - DPKG on mw1023 is OK: All packages OK [13:55:28] RECOVERY - DPKG on mw1021 is OK: All packages OK [13:55:28] RECOVERY - DPKG on mw1041 is OK: All packages OK [13:55:28] RECOVERY - DPKG on mw1028 is OK: All packages OK [13:55:38] RECOVERY - DPKG on mw1048 is OK: All packages OK [13:55:38] RECOVERY - DPKG on mw1043 is OK: All packages OK [13:55:38] RECOVERY - DPKG on mw1030 is OK: All packages OK [13:55:38] RECOVERY - DPKG on mw1031 is OK: All packages OK [13:55:38] RECOVERY - DPKG on mw1026 is OK: All packages OK [13:55:39] RECOVERY - DPKG on mw1038 is OK: All packages OK [13:57:18] PROBLEM - DPKG on mw1050 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:57:48] PROBLEM - Apache HTTP on mw1019 is CRITICAL: Connection refused [13:58:18] RECOVERY - DPKG on mw1050 is OK: All packages OK [13:58:48] RECOVERY - Apache HTTP on mw1019 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 809 bytes in 0.070 second response time [13:59:40] andrewbogott: libvirt-bin.postinst will create that interface based on this /etc/libvirt/qemu/networks/default.xml and this file /etc/libvirt/qemu/networks/autostart/default.xml [14:00:15] akosiaris: ok... [14:00:32] ok this is getting deep into this crappy software called libvirt [14:00:42] akosiaris: Isn't worrying about that interface just like worrying about some random doc or license file that a package installs? Any package will install things we don't need. [14:01:15] messing with the routing table and add interfaces with IPs != installing a file [14:02:26] I'm not saying that libvirt isn't dumb, justs saying that (in our case) it ain't broke so doesn't need fixing [14:02:27] or testing [14:04:19] (03CR) 10Dzahn: [C: 031] decom Tampa: remove service IPs [operations/dns] - 10https://gerrit.wikimedia.org/r/120063 (owner: 10Dzahn) [14:05:08] andrewbogott: yeah I get the point. I do feel though this is going to bite at some point. But I don't think I want to dig any deeper into libvirt so feel free to merge [14:05:32] this is not the debian package btw, more like the software [14:07:50] (03CR) 10Jgreen: [C: 031] remove pmtpa payments LVS monitoring [operations/puppet] - 10https://gerrit.wikimedia.org/r/125715 (owner: 10Dzahn) [14:08:28] PROBLEM - DPKG on mw1073 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:08:48] PROBLEM - DPKG on mw1057 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:08:48] PROBLEM - DPKG on mw1069 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:08:58] PROBLEM - DPKG on mw1083 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:08:58] PROBLEM - DPKG on mw1097 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:08:58] PROBLEM - DPKG on mw1059 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:08:58] PROBLEM - DPKG on mw1051 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:08:58] PROBLEM - DPKG on mw1056 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:08:58] PROBLEM - DPKG on mw1096 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:08:58] PROBLEM - DPKG on mw1078 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:08:59] PROBLEM - DPKG on mw1058 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:08:59] PROBLEM - DPKG on mw1081 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:09:00] PROBLEM - DPKG on mw1063 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:09:00] PROBLEM - DPKG on mw1074 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:09:01] PROBLEM - DPKG on mw1071 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:09:08] PROBLEM - DPKG on mw1066 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:09:08] PROBLEM - DPKG on mw1093 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:09:18] PROBLEM - DPKG on mw1090 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:09:18] PROBLEM - DPKG on mw1089 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:09:18] PROBLEM - DPKG on mw1065 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:09:19] PROBLEM - DPKG on mw1054 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:09:19] PROBLEM - DPKG on mw1086 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:09:19] PROBLEM - DPKG on mw1085 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:09:19] PROBLEM - DPKG on mw1052 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:09:20] PROBLEM - DPKG on mw1072 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:09:20] PROBLEM - DPKG on mw1076 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:09:21] PROBLEM - DPKG on mw1064 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:09:28] PROBLEM - DPKG on mw1079 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:09:28] PROBLEM - DPKG on mw1061 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:09:28] PROBLEM - DPKG on mw1053 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:09:28] PROBLEM - DPKG on mw1091 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:09:28] PROBLEM - DPKG on mw1095 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:09:29] PROBLEM - DPKG on mw1055 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:09:29] PROBLEM - DPKG on mw1060 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:09:30] PROBLEM - DPKG on mw1092 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:09:30] RECOVERY - DPKG on mw1073 is OK: All packages OK [14:09:31] PROBLEM - DPKG on mw1099 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:09:31] PROBLEM - DPKG on mw1084 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:09:39] PROBLEM - DPKG on mw1094 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:09:39] PROBLEM - DPKG on mw1098 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:09:39] PROBLEM - DPKG on mw1075 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:09:39] PROBLEM - DPKG on mw1070 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:09:39] PROBLEM - DPKG on mw1082 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:09:39] PROBLEM - DPKG on mw1087 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:09:39] PROBLEM - DPKG on mw1062 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:09:48] PROBLEM - DPKG on mw1067 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:09:48] PROBLEM - DPKG on mw1077 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:09:48] PROBLEM - DPKG on mw1088 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:09:48] RECOVERY - DPKG on mw1069 is OK: All packages OK [14:09:48] PROBLEM - DPKG on mw1068 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:09:58] RECOVERY - DPKG on mw1083 is OK: All packages OK [14:09:58] RECOVERY - DPKG on mw1097 is OK: All packages OK [14:09:58] RECOVERY - DPKG on mw1051 is OK: All packages OK [14:09:58] RECOVERY - DPKG on mw1059 is OK: All packages OK [14:09:58] RECOVERY - DPKG on mw1096 is OK: All packages OK [14:09:58] RECOVERY - DPKG on mw1078 is OK: All packages OK [14:09:59] RECOVERY - DPKG on mw1058 is OK: All packages OK [14:09:59] RECOVERY - DPKG on mw1081 is OK: All packages OK [14:10:00] RECOVERY - DPKG on mw1063 is OK: All packages OK [14:10:00] RECOVERY - DPKG on mw1074 is OK: All packages OK [14:10:01] RECOVERY - DPKG on mw1071 is OK: All packages OK [14:10:08] RECOVERY - DPKG on mw1066 is OK: All packages OK [14:10:08] RECOVERY - DPKG on mw1093 is OK: All packages OK [14:10:18] RECOVERY - DPKG on mw1090 is OK: All packages OK [14:10:18] RECOVERY - DPKG on mw1065 is OK: All packages OK [14:10:18] RECOVERY - DPKG on mw1086 is OK: All packages OK [14:10:18] RECOVERY - DPKG on mw1054 is OK: All packages OK [14:10:18] RECOVERY - DPKG on mw1089 is OK: All packages OK [14:10:18] RECOVERY - DPKG on mw1085 is OK: All packages OK [14:10:18] RECOVERY - DPKG on mw1052 is OK: All packages OK [14:10:19] RECOVERY - DPKG on mw1072 is OK: All packages OK [14:10:19] RECOVERY - DPKG on mw1076 is OK: All packages OK [14:10:20] RECOVERY - DPKG on mw1064 is OK: All packages OK [14:10:28] RECOVERY - DPKG on mw1079 is OK: All packages OK [14:10:28] RECOVERY - DPKG on mw1053 is OK: All packages OK [14:10:28] RECOVERY - DPKG on mw1061 is OK: All packages OK [14:10:28] RECOVERY - DPKG on mw1091 is OK: All packages OK [14:10:28] RECOVERY - DPKG on mw1095 is OK: All packages OK [14:10:28] RECOVERY - DPKG on mw1055 is OK: All packages OK [14:10:28] RECOVERY - DPKG on mw1060 is OK: All packages OK [14:10:29] RECOVERY - DPKG on mw1092 is OK: All packages OK [14:10:29] RECOVERY - DPKG on mw1099 is OK: All packages OK [14:10:30] RECOVERY - DPKG on mw1084 is OK: All packages OK [14:10:38] RECOVERY - DPKG on mw1094 is OK: All packages OK [14:10:38] RECOVERY - DPKG on mw1098 is OK: All packages OK [14:10:38] RECOVERY - DPKG on mw1075 is OK: All packages OK [14:10:38] RECOVERY - DPKG on mw1070 is OK: All packages OK [14:10:38] RECOVERY - DPKG on mw1082 is OK: All packages OK [14:10:39] RECOVERY - DPKG on mw1087 is OK: All packages OK [14:10:39] RECOVERY - DPKG on mw1062 is OK: All packages OK [14:10:48] RECOVERY - DPKG on mw1067 is OK: All packages OK [14:10:48] RECOVERY - DPKG on mw1088 is OK: All packages OK [14:10:48] RECOVERY - DPKG on mw1077 is OK: All packages OK [14:10:48] RECOVERY - DPKG on mw1068 is OK: All packages OK [14:10:58] RECOVERY - DPKG on mw1056 is OK: All packages OK [14:11:48] RECOVERY - DPKG on mw1057 is OK: All packages OK [14:18:58] PROBLEM - DPKG on mw1106 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:18:58] PROBLEM - DPKG on mw1113 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:18:58] PROBLEM - DPKG on mw1177 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:18:58] PROBLEM - DPKG on mw1187 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:18:58] PROBLEM - DPKG on mw1110 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:18:58] PROBLEM - DPKG on mw1170 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:18:59] PROBLEM - DPKG on mw1173 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:18:59] PROBLEM - DPKG on mw1180 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:19:18] PROBLEM - DPKG on mw1163 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:19:18] PROBLEM - DPKG on mw1109 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:19:18] PROBLEM - DPKG on mw1176 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:19:18] PROBLEM - DPKG on mw1184 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:19:18] PROBLEM - DPKG on mw1105 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:19:18] PROBLEM - DPKG on mw1179 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:19:28] PROBLEM - DPKG on mw1112 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:19:28] PROBLEM - DPKG on mw1183 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:19:28] PROBLEM - DPKG on mw1175 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:19:28] PROBLEM - DPKG on mw1107 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:19:38] PROBLEM - DPKG on mw1185 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:19:38] PROBLEM - DPKG on mw1182 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:19:39] PROBLEM - DPKG on mw1167 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:19:39] PROBLEM - DPKG on mw1171 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:19:39] PROBLEM - DPKG on mw1169 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:19:39] PROBLEM - DPKG on mw1181 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:19:39] PROBLEM - DPKG on mw1186 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:19:40] PROBLEM - DPKG on mw1178 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:19:40] PROBLEM - DPKG on mw1100 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:19:41] PROBLEM - DPKG on mw1101 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:19:41] PROBLEM - DPKG on mw1172 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:19:42] PROBLEM - DPKG on mw1174 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:19:42] PROBLEM - DPKG on mw1161 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:19:43] PROBLEM - DPKG on mw1104 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:19:48] PROBLEM - DPKG on mw1165 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:19:48] PROBLEM - DPKG on mw1168 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:19:48] PROBLEM - DPKG on mw1111 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:19:48] PROBLEM - DPKG on mw1164 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:19:48] PROBLEM - DPKG on mw1103 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:19:48] PROBLEM - DPKG on mw1108 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:19:48] PROBLEM - DPKG on mw1188 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:19:49] PROBLEM - DPKG on mw1102 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:19:58] PROBLEM - DPKG on mw1166 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:19:58] PROBLEM - DPKG on mw1162 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:20:18] PROBLEM - DPKG on mw1220 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:20:28] PROBLEM - DPKG on mw1211 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:20:28] PROBLEM - DPKG on mw1219 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:20:28] PROBLEM - DPKG on mw1213 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:20:38] PROBLEM - DPKG on mw1216 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:20:38] PROBLEM - DPKG on mw1218 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:20:48] PROBLEM - DPKG on mw1212 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:20:48] PROBLEM - DPKG on mw1210 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:20:48] PROBLEM - DPKG on mw1209 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:20:58] PROBLEM - DPKG on mw1214 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:20:58] PROBLEM - DPKG on mw1217 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:20:58] PROBLEM - DPKG on mw1215 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:21:18] RECOVERY - DPKG on mw1220 is OK: All packages OK [14:21:28] RECOVERY - DPKG on mw1219 is OK: All packages OK [14:21:28] RECOVERY - DPKG on mw1213 is OK: All packages OK [14:21:38] RECOVERY - DPKG on mw1216 is OK: All packages OK [14:21:38] RECOVERY - DPKG on mw1218 is OK: All packages OK [14:21:48] RECOVERY - DPKG on mw1212 is OK: All packages OK [14:21:48] RECOVERY - DPKG on mw1210 is OK: All packages OK [14:21:48] RECOVERY - DPKG on mw1209 is OK: All packages OK [14:21:58] RECOVERY - DPKG on mw1214 is OK: All packages OK [14:21:58] RECOVERY - DPKG on mw1215 is OK: All packages OK [14:21:58] RECOVERY - DPKG on mw1217 is OK: All packages OK [14:22:18] PROBLEM - Host tellurium is DOWN: PING CRITICAL - Packet loss = 100% [14:22:28] RECOVERY - DPKG on mw1112 is OK: All packages OK [14:22:28] RECOVERY - DPKG on mw1183 is OK: All packages OK [14:22:28] RECOVERY - DPKG on mw1175 is OK: All packages OK [14:22:28] RECOVERY - DPKG on mw1211 is OK: All packages OK [14:22:28] RECOVERY - DPKG on mw1107 is OK: All packages OK [14:22:38] RECOVERY - DPKG on mw1182 is OK: All packages OK [14:22:39] RECOVERY - DPKG on mw1185 is OK: All packages OK [14:22:39] RECOVERY - DPKG on mw1171 is OK: All packages OK [14:22:39] RECOVERY - DPKG on mw1169 is OK: All packages OK [14:22:39] RECOVERY - DPKG on mw1178 is OK: All packages OK [14:22:39] RECOVERY - DPKG on mw1186 is OK: All packages OK [14:22:39] RECOVERY - DPKG on mw1181 is OK: All packages OK [14:22:40] RECOVERY - DPKG on mw1167 is OK: All packages OK [14:22:40] RECOVERY - DPKG on mw1100 is OK: All packages OK [14:22:41] RECOVERY - DPKG on mw1101 is OK: All packages OK [14:22:41] RECOVERY - DPKG on mw1172 is OK: All packages OK [14:22:42] RECOVERY - DPKG on mw1174 is OK: All packages OK [14:22:42] RECOVERY - DPKG on mw1161 is OK: All packages OK [14:22:43] RECOVERY - DPKG on mw1104 is OK: All packages OK [14:22:48] RECOVERY - DPKG on mw1165 is OK: All packages OK [14:22:48] RECOVERY - DPKG on mw1168 is OK: All packages OK [14:22:48] RECOVERY - DPKG on mw1111 is OK: All packages OK [14:22:48] RECOVERY - DPKG on mw1164 is OK: All packages OK [14:22:48] RECOVERY - DPKG on mw1108 is OK: All packages OK [14:22:48] RECOVERY - DPKG on mw1103 is OK: All packages OK [14:22:58] RECOVERY - DPKG on mw1187 is OK: All packages OK [14:22:58] RECOVERY - DPKG on mw1177 is OK: All packages OK [14:22:58] RECOVERY - DPKG on mw1170 is OK: All packages OK [14:22:58] RECOVERY - DPKG on mw1110 is OK: All packages OK [14:22:58] RECOVERY - DPKG on mw1180 is OK: All packages OK [14:22:59] RECOVERY - DPKG on mw1173 is OK: All packages OK [14:23:18] RECOVERY - DPKG on mw1163 is OK: All packages OK [14:23:18] RECOVERY - DPKG on mw1109 is OK: All packages OK [14:23:18] RECOVERY - DPKG on mw1176 is OK: All packages OK [14:23:18] RECOVERY - DPKG on mw1184 is OK: All packages OK [14:23:18] RECOVERY - DPKG on mw1105 is OK: All packages OK [14:23:18] RECOVERY - DPKG on mw1179 is OK: All packages OK [14:23:48] RECOVERY - DPKG on mw1188 is OK: All packages OK [14:23:48] RECOVERY - DPKG on mw1102 is OK: All packages OK [14:23:58] RECOVERY - DPKG on mw1166 is OK: All packages OK [14:23:58] RECOVERY - DPKG on mw1162 is OK: All packages OK [14:23:58] RECOVERY - DPKG on mw1113 is OK: All packages OK [14:23:58] RECOVERY - DPKG on mw1106 is OK: All packages OK [14:25:18] RECOVERY - Host tellurium is UP: PING OK - Packet loss = 0%, RTA = 0.93 ms [14:27:58] PROBLEM - DPKG on mw1135 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:27:58] PROBLEM - DPKG on mw1137 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:27:58] PROBLEM - DPKG on mw1131 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:27:58] PROBLEM - DPKG on mw1124 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:27:58] PROBLEM - DPKG on mw1116 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:27:59] PROBLEM - DPKG on mw1132 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:28:08] PROBLEM - DPKG on mw1133 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:28:08] PROBLEM - DPKG on mw1139 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:28:08] PROBLEM - DPKG on mw1122 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:28:08] PROBLEM - DPKG on mw1117 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:28:08] PROBLEM - DPKG on mw1144 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:28:18] PROBLEM - DPKG on mw1148 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:28:18] PROBLEM - DPKG on mw1114 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:28:18] PROBLEM - DPKG on mw1147 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:28:18] PROBLEM - DPKG on mw1129 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:28:18] PROBLEM - DPKG on mw1119 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:28:18] PROBLEM - DPKG on mw1143 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:28:18] PROBLEM - DPKG on mw1118 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:28:19] PROBLEM - DPKG on mw1130 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:28:28] PROBLEM - DPKG on mw1127 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:28:28] PROBLEM - DPKG on mw1142 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:28:28] PROBLEM - DPKG on mw1146 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:28:28] PROBLEM - DPKG on mw1141 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:28:39] PROBLEM - DPKG on mw1115 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:28:39] PROBLEM - DPKG on mw1126 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:28:39] PROBLEM - DPKG on mw1125 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:28:39] PROBLEM - DPKG on mw1123 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:28:39] PROBLEM - DPKG on mw1136 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:28:39] PROBLEM - DPKG on mw1134 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:28:48] PROBLEM - DPKG on mw1138 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:28:48] PROBLEM - DPKG on mw1128 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:28:48] PROBLEM - DPKG on mw1145 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:28:48] PROBLEM - DPKG on mw1121 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:28:48] PROBLEM - DPKG on mw1120 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:28:48] PROBLEM - DPKG on mw1140 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:29:28] PROBLEM - DPKG on mw1199 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:29:38] PROBLEM - DPKG on mw1190 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:29:39] PROBLEM - DPKG on mw1193 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:29:39] PROBLEM - DPKG on mw1206 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:29:39] PROBLEM - DPKG on mw1191 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:29:39] PROBLEM - DPKG on mw1194 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:29:48] PROBLEM - DPKG on mw1195 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:29:48] PROBLEM - DPKG on mw1203 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:29:48] PROBLEM - DPKG on mw1196 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:29:48] PROBLEM - DPKG on mw1202 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:29:58] PROBLEM - DPKG on mw1208 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:29:58] PROBLEM - DPKG on mw1200 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:29:58] PROBLEM - DPKG on mw1192 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:29:58] PROBLEM - DPKG on mw1189 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:30:18] PROBLEM - DPKG on mw1205 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:30:39] RECOVERY - DPKG on mw1191 is OK: All packages OK [14:30:39] RECOVERY - DPKG on mw1194 is OK: All packages OK [14:30:48] RECOVERY - DPKG on mw1195 is OK: All packages OK [14:30:48] RECOVERY - DPKG on mw1203 is OK: All packages OK [14:30:48] RECOVERY - DPKG on mw1196 is OK: All packages OK [14:30:48] RECOVERY - DPKG on mw1202 is OK: All packages OK [14:30:58] RECOVERY - DPKG on mw1208 is OK: All packages OK [14:30:58] RECOVERY - DPKG on mw1200 is OK: All packages OK [14:30:58] RECOVERY - DPKG on mw1192 is OK: All packages OK [14:30:58] RECOVERY - DPKG on mw1189 is OK: All packages OK [14:31:18] RECOVERY - DPKG on mw1205 is OK: All packages OK [14:31:28] RECOVERY - DPKG on mw1127 is OK: All packages OK [14:31:28] RECOVERY - DPKG on mw1142 is OK: All packages OK [14:31:28] RECOVERY - DPKG on mw1146 is OK: All packages OK [14:31:28] RECOVERY - DPKG on mw1199 is OK: All packages OK [14:31:38] RECOVERY - DPKG on mw1190 is OK: All packages OK [14:31:38] RECOVERY - DPKG on mw1193 is OK: All packages OK [14:31:39] RECOVERY - DPKG on mw1206 is OK: All packages OK [14:31:48] RECOVERY - DPKG on mw1138 is OK: All packages OK [14:31:48] RECOVERY - DPKG on mw1145 is OK: All packages OK [14:31:48] RECOVERY - DPKG on mw1121 is OK: All packages OK [14:31:48] RECOVERY - DPKG on mw1120 is OK: All packages OK [14:31:48] RECOVERY - DPKG on mw1128 is OK: All packages OK [14:31:49] RECOVERY - DPKG on mw1140 is OK: All packages OK [14:31:58] RECOVERY - DPKG on mw1124 is OK: All packages OK [14:31:58] RECOVERY - DPKG on mw1131 is OK: All packages OK [14:31:58] RECOVERY - DPKG on mw1137 is OK: All packages OK [14:31:58] RECOVERY - DPKG on mw1135 is OK: All packages OK [14:31:58] RECOVERY - DPKG on mw1116 is OK: All packages OK [14:31:58] RECOVERY - DPKG on mw1132 is OK: All packages OK [14:32:08] RECOVERY - DPKG on mw1133 is OK: All packages OK [14:32:08] RECOVERY - DPKG on mw1122 is OK: All packages OK [14:32:08] RECOVERY - DPKG on mw1139 is OK: All packages OK [14:32:08] RECOVERY - DPKG on mw1117 is OK: All packages OK [14:32:08] RECOVERY - DPKG on mw1144 is OK: All packages OK [14:32:18] RECOVERY - DPKG on mw1148 is OK: All packages OK [14:32:18] RECOVERY - DPKG on mw1114 is OK: All packages OK [14:32:18] RECOVERY - DPKG on mw1119 is OK: All packages OK [14:32:18] RECOVERY - DPKG on mw1147 is OK: All packages OK [14:32:18] RECOVERY - DPKG on mw1143 is OK: All packages OK [14:32:18] RECOVERY - DPKG on mw1118 is OK: All packages OK [14:32:18] RECOVERY - DPKG on mw1129 is OK: All packages OK [14:32:19] RECOVERY - DPKG on mw1130 is OK: All packages OK [14:32:28] RECOVERY - DPKG on mw1141 is OK: All packages OK [14:32:38] RECOVERY - DPKG on mw1126 is OK: All packages OK [14:32:38] RECOVERY - DPKG on mw1115 is OK: All packages OK [14:32:38] RECOVERY - DPKG on mw1123 is OK: All packages OK [14:32:39] RECOVERY - DPKG on mw1136 is OK: All packages OK [14:32:39] RECOVERY - DPKG on mw1134 is OK: All packages OK [14:32:39] RECOVERY - DPKG on mw1125 is OK: All packages OK [14:35:05] (03CR) 10Andrew Bogott: [C: 032] Exclude virbr0 from eth interface tests. [operations/puppet] - 10https://gerrit.wikimedia.org/r/125183 (owner: 10Andrew Bogott) [14:36:05] (03CR) 10Dzahn: "apergos, this is the alternative to have passive checks but not use snmptrap" [operations/puppet] - 10https://gerrit.wikimedia.org/r/520 (owner: 10Dzahn) [14:36:53] apergos: had to, because of the low number [14:36:58] RECOVERY - check configured eth on virt1003 is OK: NRPE: Unable to read output [14:37:38] RECOVERY - check configured eth on virt1001 is OK: NRPE: Unable to read output [14:37:44] okey dokey [14:37:48] RECOVERY - check configured eth on virt1004 is OK: NRPE: Unable to read output [14:37:53] akosiaris: regarding $::qualifying… sorry I keep not understanding this. If globals can't be set in node definitions, then where are they set? [14:38:04] akosiaris: (this is regarding https://gerrit.wikimedia.org/r/#/c/119488/ in case you want context) [14:38:18] PROBLEM - DPKG on ms-be1 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:38:18] PROBLEM - DPKG on ms-be9 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:38:28] PROBLEM - DPKG on ms-be4 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:38:28] PROBLEM - DPKG on ms-be11 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:38:39] PROBLEM - DPKG on ms-be2 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:38:39] PROBLEM - DPKG on ms-be8 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:38:39] PROBLEM - DPKG on ms-be3 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:38:44] !log upgrading all packages & staggered restart of all of swift (ms-fe/ms-be) [14:38:48] PROBLEM - DPKG on ms-be12 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:38:48] PROBLEM - DPKG on ms-be7 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:38:48] PROBLEM - DPKG on ms-be6 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:38:50] Logged the message, Master [14:38:52] facts and in things like $::realm, $::site etc [14:38:58] PROBLEM - DPKG on ms-be10 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:39:11] where does $::realm and $::site come from? [14:39:18] PROBLEM - DPKG on ms-be1008 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:39:18] PROBLEM - DPKG on ms-be1001 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:39:18] PROBLEM - DPKG on ms-be1012 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:39:18] PROBLEM - DPKG on ms-be1007 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:39:19] realm.pp I think [14:39:28] PROBLEM - DPKG on ms-be1006 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:39:28] PROBLEM - DPKG on ms-be1009 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:39:39] PROBLEM - DPKG on ms-be1004 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:39:39] PROBLEM - DPKG on ms-be1003 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:39:39] RECOVERY - DPKG on ms-be3 is OK: All packages OK [14:39:39] RECOVERY - DPKG on ms-be8 is OK: All packages OK [14:39:39] PROBLEM - DPKG on ms-be1002 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:39:48] PROBLEM - DPKG on ms-be1010 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:39:48] PROBLEM - DPKG on ms-be1011 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:39:48] RECOVERY - DPKG on ms-be12 is OK: All packages OK [14:39:48] RECOVERY - DPKG on ms-be7 is OK: All packages OK [14:39:49] RECOVERY - DPKG on ms-be6 is OK: All packages OK [14:39:58] RECOVERY - DPKG on ms-be10 is OK: All packages OK [14:40:08] PROBLEM - DPKG on ms-be1005 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:40:11] (03PS1) 10Giuseppe Lavagetto: Substituting the check_graphite script. [operations/puppet] - 10https://gerrit.wikimedia.org/r/125726 [14:40:18] RECOVERY - DPKG on ms-be1008 is OK: All packages OK [14:40:18] RECOVERY - DPKG on ms-be1001 is OK: All packages OK [14:40:18] RECOVERY - DPKG on ms-be1012 is OK: All packages OK [14:40:18] RECOVERY - DPKG on ms-be1007 is OK: All packages OK [14:40:18] RECOVERY - DPKG on ms-be9 is OK: All packages OK [14:40:26] akosiaris: ok, but… this is why I feel like I"m going in circles. What's special about realm.pp that allows it to set $::top scope? [14:40:28] RECOVERY - DPKG on ms-be1006 is OK: All packages OK [14:40:28] RECOVERY - DPKG on ms-be1009 is OK: All packages OK [14:40:28] RECOVERY - DPKG on ms-be4 is OK: All packages OK [14:40:28] RECOVERY - DPKG on ms-be11 is OK: All packages OK [14:40:38] RECOVERY - DPKG on ms-be1004 is OK: All packages OK [14:40:38] RECOVERY - DPKG on ms-be1003 is OK: All packages OK [14:40:39] PROBLEM - DPKG on formey is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:40:39] RECOVERY - DPKG on ms-be2 is OK: All packages OK [14:40:39] RECOVERY - DPKG on ms-be1002 is OK: All packages OK [14:40:40] nothing special. [14:40:48] RECOVERY - DPKG on ms-be1010 is OK: All packages OK [14:40:48] RECOVERY - DPKG on ms-be1011 is OK: All packages OK [14:41:01] but node level variables should be in a different scope [14:41:07] and not the global scope [14:41:08] RECOVERY - DPKG on ms-be1005 is OK: All packages OK [14:41:34] hmm [14:41:39] let me rephrase that [14:41:39] (03PS1) 10ArielGlenn: ms1001 should no longer sync from nas, long since obsolete [operations/puppet] - 10https://gerrit.wikimedia.org/r/125727 [14:41:50] (03CR) 10Dzahn: "NSCA = Nagios Service Check Acceptor" [operations/puppet] - 10https://gerrit.wikimedia.org/r/520 (owner: 10Dzahn) [14:42:05] node level variables are in node-level scope and should not be in global scope [14:42:07] akosiaris: http://lists.wikimedia.org/pipermail/wikitech-l/2014-January/074222.html ? [14:42:10] does this make more sense ? [14:42:34] that mail is technically incorrect [14:42:49] :( How can they both be correct? [14:42:53] http://docs.puppetlabs.com/guides/scope_and_puppet.html [14:42:55] (03CR) 10Dzahn: "it uses crypto and there were iptables (pre-ferm but puppetized) rules:" [operations/puppet] - 10https://gerrit.wikimedia.org/r/520 (owner: 10Dzahn) [14:43:02] local, inherited, node-level, global [14:43:28] so a variable will be looked up correctly in puppet 3 in node level scope [14:43:35] <_joe_> akosiaris: citing the puppet scope page as a clarificator is pure trolling :) [14:43:38] PROBLEM - HTTP on formey is CRITICAL: Connection refused [14:43:39] RECOVERY - DPKG on formey is OK: All packages OK [14:43:48] _joe_: thank you kind sir :-) [14:43:58] PROBLEM - DPKG on ms-fe4 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:44:03] (03Abandoned) 10ArielGlenn: ms1001 should no longer sync from nas, long since obsolete [operations/puppet] - 10https://gerrit.wikimedia.org/r/125727 (owner: 10ArielGlenn) [14:44:07] <_joe_> akosiaris: joking of course :) [14:44:14] same here :-) [14:44:18] PROBLEM - DPKG on ms-fe3 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:44:32] anyway, what ryan says is true albeit the wording is wrong [14:44:38] PROBLEM - DPKG on ms-fe1 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:44:48] PROBLEM - DPKG on ms-fe2 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:44:51] (03PS1) 10ArielGlenn: ms1001 should no longer sync from nas, long since obsolete [operations/puppet] - 10https://gerrit.wikimedia.org/r/125728 [14:44:58] that means that yes, variables will be looked up correctly if defined in node-level scope [14:45:26] but not because they are global, but rather because puppet will look up variables in node-level [14:45:41] (03CR) 10ArielGlenn: [C: 032] ms1001 should no longer sync from nas, long since obsolete [operations/puppet] - 10https://gerrit.wikimedia.org/r/125728 (owner: 10ArielGlenn) [14:45:51] andrewbogott: does this make more sense now ? [14:45:56] bah too much in a hurry [14:45:59] cme on gerrit [14:46:39] RECOVERY - DPKG on ms-fe1 is OK: All packages OK [14:46:48] RECOVERY - DPKG on ms-fe2 is OK: All packages OK [14:46:48] PROBLEM - DPKG on ms-fe1002 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:46:58] RECOVERY - DPKG on ms-fe4 is OK: All packages OK [14:47:16] btw this does indeed mean this code will continue working in puppet 3 and that puppet is wrong to whine on this error. [14:47:18] PROBLEM - DPKG on ms-fe1001 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:47:18] RECOVERY - DPKG on ms-fe3 is OK: All packages OK [14:47:23] akosiaris: Um... [14:47:33] I'm trying to find a style guide that explains how we /should/ do things like this. [14:47:52] I wrote one long ago, but can't find it, maybe because it was wrong for puppet 3. But in that case we need something to replace it [14:47:53] * andrewbogott digs [14:48:13] <_joe_> sorry andrewbogott what commit are you talking about? [14:48:23] https://gerrit.wikimedia.org/r/#/c/119488/3/manifests/role/nova.pp [14:48:33] that btw has me thinking... [14:48:38] !log upgrading all snapshot* hosts [14:48:44] Logged the message, Master [14:48:48] I have stopped trusting the warnings from puppet 2.x [14:49:09] I think I will setup a machine with puppet 3 and recompile all our catalogs there [14:49:13] Oh, nevermind… https://wikitech.wikimedia.org/wiki/Puppet_coding#Organization [14:49:18] RECOVERY - DPKG on ms-fe1001 is OK: All packages OK [14:49:30] apergos: is there a special process to restart php on the snapshot boxes? [14:49:38] So, everyone seems to agree that 4. is wrong. But I'm completely baffled by what we /should/ do in that case. [14:49:40] no [14:49:48] RECOVERY - DPKG on ms-fe1002 is OK: All packages OK [14:49:59] the next run of any script will pick it up [14:50:10] all the dumps are multiple steps [14:50:40] manybubbles: You going to handle the SWAT deploy today? [14:50:44] <_joe_> akosiaris: we should think about how to migrate - maybe a new master with puppet3 and a different branch of operations/puppet for testing? [14:50:48] RECOVERY - check configured eth on virt1002 is OK: NRPE: Unable to read output [14:50:58] akosiaris: so, that section (which I wrote) reflected our design at the time. Which is that variables were defined at node-level, used at role-level and passed as params to any classes included from roles. [14:51:05] anomie: may as well because it is my code. have you been running them for the past while? I haven't been paying good attention [14:51:11] akosiaris: If that's no longer the proper approach, can you please revise that page accordingly? [14:51:14] _joe_: heh, that tooo, but first let's get to a point where the catalogs actually compile [14:51:18] manybubbles: I've been checking, and no one has used our window for a while [14:51:18] PROBLEM - DPKG on snapshot1001 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:51:18] PROBLEM - DPKG on snapshot1003 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:51:39] PROBLEM - DPKG on snapshot1002 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:51:39] PROBLEM - DPKG on snapshot1004 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:51:48] PROBLEM - DPKG on ms-fe1003 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:51:58] PROBLEM - DPKG on ms-fe1004 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:52:01] <_joe_> akosiaris: right :) [14:52:21] andrewbogott: still correct. What should I update ? Btw I am almost certain that change is no good [14:52:40] akosiaris: OK, so if I understand properly... [14:52:45] A node can define $foo = 'bananas' [14:52:49] And include role::bar [14:52:55] role::bar should reference $foo [14:53:02] as $foo /not/ as $::foo [14:53:04] correct? [14:53:06] yes [14:53:07] :-) [14:53:12] awesome. [14:53:21] I'll make a note of that, and then reject that patch. [14:53:23] Thank you! [14:53:26] <_joe_> yes but that's *bad* [14:54:06] <_joe_> anything needed in a class as configuration should either be passed as a parameter or via hiera in some cases [14:54:20] akosiaris: "Role classes are never parameterized, and are only configured via node variables. Note that within role classes these variables are referred to with node scope ($foo) rather than global scope ($::foo)." [14:54:29] _joe_: Sure, but we still need a pre-heira styleguide. [14:54:48] <_joe_> andrewbogott: IMO, I'd use parameters on classes [14:54:48] RECOVERY - DPKG on ms-fe1003 is OK: All packages OK [14:54:54] andrewbogott: yey! [14:54:56] _joe_: Among other things, that style is required for the current labs web interface to work. [14:54:58] RECOVERY - DPKG on ms-fe1004 is OK: All packages OK [14:55:08] <_joe_> andrewbogott: I see, sigh. [14:55:19] _joe_: Currently in our style roles never have params. It's not ideal but… one thing at a time. [14:55:41] Getting from that style to heira should be pretty straightforward, no need to parameterize roles in between, right? [14:55:42] <_joe_> andrewbogott: ok agreed, mine was a general advice [14:55:49] Unless I totally misunderstand how heira works [14:55:55] I think so too [14:56:01] <_joe_> andrewbogott: it should be easy, yes [14:56:14] * andrewbogott makes a note about that on the wiki, too [14:56:17] thanks all [14:56:18] RECOVERY - DPKG on snapshot1001 is OK: All packages OK [14:56:18] RECOVERY - DPKG on snapshot1003 is OK: All packages OK [14:56:28] RECOVERY - check configured eth on virt1005 is OK: NRPE: Unable to read output [14:56:38] RECOVERY - DPKG on snapshot1002 is OK: All packages OK [14:56:38] RECOVERY - DPKG on snapshot1004 is OK: All packages OK [14:56:48] <_joe_> andrewbogott: you will probably need to play a little with hiera lookup orders in some cases. [14:57:13] <_joe_> but in general, it should be flawless. [14:58:50] (03CR) 10Andrew Bogott: [C: 04-2] "I've just discussed this with a couple of other ops and updated our style guide accordingly:" [operations/puppet] - 10https://gerrit.wikimedia.org/r/119488 (owner: 10Matanya) [14:58:55] (03CR) 10Dzahn: "let's see how the attempts to import RT into phabricator go, then it would turn into "Kanban chart for phab"?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/111152 (owner: 10Diederik) [14:59:25] Only 35 patches to go [14:59:31] * andrewbogott hasn't done code review in a month [14:59:39] RECOVERY - check configured eth on virt1007 is OK: NRPE: Unable to read output [14:59:55] hahaha great recovery message... [15:00:01] <_joe_> andrewbogott: if you need help, I may be able to help [15:00:15] _joe_: Thanks. I bet that they're mostly just lint fixes. [15:00:28] PROBLEM - Slow CirrusSearch query rate on fluorine is CRITICAL: CirrusSearch-slow.log_line_rate CRITICAL: 0.000277777777778 [15:00:37] apergos: Yeah, I wonder if that means that I just 100% broke that test :( [15:01:35] It involved writing a regexp which means that my odds of succeeding were low [15:02:02] uugghh I see [15:02:32] (03CR) 10Dzahn: "so Kanban chart = Toyota JIT production , heh http://en.wikipedia.org/wiki/Kanban" [operations/puppet] - 10https://gerrit.wikimedia.org/r/111152 (owner: 10Diederik) [15:02:36] apergos: https://gerrit.wikimedia.org/r/#/c/125183/1/modules/base/templates/check_eth.erb [15:02:56] * manybubbles has the conch [15:03:27] what happens if you run the check by hand on that host? [15:05:48] RECOVERY - check configured eth on virt1006 is OK: NRPE: Unable to read output [15:06:21] apergos: it bombs in a way unrelated to my change… [15:06:22] ./check_eth: 21: [: -lt: unexpected operator [15:06:24] andrewbogott: fwiw.. i saw this earlier [15:06:32] /sbin/ethtool lo | awk '/Speed:/ {gsub("Mb/s","",);print }' [15:06:32] CONF: REQ: 1000 [15:06:43] on mw1057 [15:06:48] So it's broken but I didn't break it [15:06:52] * andrewbogott counts that as a win [15:07:13] well, yes and no [15:07:18] it also worked fine and found hosts [15:07:27] that have lower speed than they should [15:08:20] mutante: In some cases CONF_SPEED='' [15:08:22] andrewbogott: it made me create https://rt.wikimedia.org/Ticket/Display.html?id=7266 [15:08:27] that's causing the test to misbehave [15:08:32] and those were correct warnings [15:09:02] !log manybubbles synchronized php-1.23wmf21/extensions/CirrusSearch/ 'SWAT deploy to improve performance' [15:09:04] but i also saw what you just said, on at least one host CONF_SPEED was empty [15:09:07] ack [15:09:08] Logged the message, Master [15:09:10] mutante: I feel like you're talking about something else...? [15:09:15] Ah, ok. [15:09:26] root@mw1057:~# /sbin/ethtool eth0 | grep Speed Speed: 100Mb/s [15:09:36] mw1057 f.e. is actually 100MB/s [15:09:44] (03CR) 10Manybubbles: [C: 032] Switch Cirrus to a faster query type [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/125410 (owner: 10Manybubbles) [15:09:49] /sbin/ethtool lo [15:09:49] Settings for lo: [15:09:49] Link detected: yes [15:09:54] that is the reason [15:10:00] (03Merged) 10jenkins-bot: Switch Cirrus to a faster query type [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/125410 (owner: 10Manybubbles) [15:10:11] andrewbogott: care adding loopback interface in that regexp ? [15:10:26] yeah, but I don't think that's all... [15:10:27] somehow I missed it [15:10:58] 'NRPE: Unable to read output [15:11:05] that should be a different issue [15:11:10] !log manybubbles synchronized wmf-config/CirrusSearch-common.php 'SWAT Cirrus update to improve performance' [15:11:16] Logged the message, Master [15:12:19] akosiaris: It's erroring out for br1002 for the same reason... [15:12:37] * manybubbles puts down the conch [15:12:54] * closedmouth drops a rock on manybubbles's head [15:13:22] * manybubbles dies [15:14:46] (03CR) 10Dzahn: "there is a typo in there, see "EDIT_CODE" vs. "EXIT_CODE"" [operations/puppet] - 10https://gerrit.wikimedia.org/r/125183 (owner: 10Andrew Bogott) [15:15:01] andrewbogott: EDIT_CODE=1 [15:15:05] EXIT_CODE=2 [15:15:09] ahahaha [15:15:45] there are many things wrong with this test [15:15:48] but that is a big one! [15:21:56] (03CR) 10Giuseppe Lavagetto: [C: 031] "Seems correct, but please check." (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/118966 (owner: 10Matanya) [15:22:00] (03PS1) 10Andrew Bogott: Several fixes for check_eth: [operations/puppet] - 10https://gerrit.wikimedia.org/r/125732 [15:22:01] mutante: ^ [15:22:02] (03PS1) 10Dzahn: check_eth - fix typo and use 3 for UNKOWN in icinga [operations/puppet] - 10https://gerrit.wikimedia.org/r/125733 [15:22:05] (03PS1) 10Alexandros Kosiaris: Dont check speed on ifaces that dont report one [operations/puppet] - 10https://gerrit.wikimedia.org/r/125734 [15:22:08] oops, heh, ok [15:22:15] hehe [15:22:16] Um, did we all three write a patch at the same time? [15:22:32] seems so :) [15:22:45] Alex's is the best imo [15:22:54] <_joe_> now have fun merging them :) [15:22:59] !log restarting virt0's salt-master, glance-api, glance-registry, keystone, nova-scheduler [15:23:01] Logged the message, Master [15:23:45] (03Abandoned) 10Andrew Bogott: Several fixes for check_eth: [operations/puppet] - 10https://gerrit.wikimedia.org/r/125732 (owner: 10Andrew Bogott) [15:24:37] * andrewbogott wonders what it would mean if lo had NO CARRIER [15:24:57] (03CR) 10Dzahn: "can we use exit code 3 ? see http://docs.icinga.org/latest/en/pluginapi.html#returncode" (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/125734 (owner: 10Alexandros Kosiaris) [15:24:57] ahh i have seen that [15:25:25] mutante: you can use for unknown things, but is this really unknown ? [15:25:46] so andrewbogott, lo down is a bad bad thing [15:25:55] yep, so testing for it (as your patch still does) is good. [15:25:56] yes, the comment is "This should never happen." :) [15:25:57] things misbehave in ways you can not possibly imagive [15:26:04] so then it's unknown [15:26:10] imagine* [15:26:17] 'this should never happen' = network card came unseated. Definitely a critical! [15:26:20] (03CR) 10Giuseppe Lavagetto: "+1 to Dzahn." [operations/puppet] - 10https://gerrit.wikimedia.org/r/125734 (owner: 10Alexandros Kosiaris) [15:27:11] <_joe_> akosiaris: whenever a result in nagios cannot be interpreted, I'd use UNKNOWN (and alarm on it) [15:27:32] heh, that is correctly technically, socially... good luck [15:27:42] unknowns always get ignored [15:27:55] If an interface vanishes that test will fail until the next puppet run, then we'll stop running the test [15:28:39] (03CR) 10Dzahn: [C: 031] "ok, convinced by "correctly technically, socially... good luck"" [operations/puppet] - 10https://gerrit.wikimedia.org/r/125734 (owner: 10Alexandros Kosiaris) [15:29:06] (03CR) 10Faidon Liambotis: [C: 032] "UNKNOWN still appears as an alert, for example on Icinga's web interface. The status code exists for when there is something to check, but" [operations/puppet] - 10https://gerrit.wikimedia.org/r/125734 (owner: 10Alexandros Kosiaris) [15:36:42] (03CR) 10Alexandros Kosiaris: [C: 04-1] protoproxy: call enable_ipv6_proxy in a sane way (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/118966 (owner: 10Matanya) [15:36:47] (03Abandoned) 10Dzahn: check_eth - fix typo and use 3 for UNKOWN in icinga [operations/puppet] - 10https://gerrit.wikimedia.org/r/125733 (owner: 10Dzahn) [15:38:13] <_joe_> akosiaris: you're probably right, when I see unknown I usually assume something bad is happening :) [15:38:31] <_joe_> be back in 10 minutes. [15:38:59] btw, check the 2 actual UNKNOWN we currently have (unrelated, SSL cert checks on fluorine and antimony) [15:39:02] also bbl [15:39:39] s/fluorine/formey [15:41:41] (03CR) 10Andrew Bogott: "Sorry that it took me ages to look at this. If you rebase I'll try to read it soon. Can you explain what you mean about 'this package ca" [operations/debs/adminbot] - 10https://gerrit.wikimedia.org/r/68935 (owner: 10AzaToth) [15:44:07] (03CR) 10Andrew Bogott: [C: 04-2] "I believe this is moot due to the fix to https://rt.wikimedia.org/Ticket/Display.html?id=80" [operations/puppet] - 10https://gerrit.wikimedia.org/r/107424 (owner: 10Gage) [15:45:00] (03CR) 10Andrew Bogott: [C: 04-2] "I believe this is unneeded. See also https://gerrit.wikimedia.org/r/#/c/119488/" [operations/puppet] - 10https://gerrit.wikimedia.org/r/97007 (owner: 10ArielGlenn) [15:46:14] ottomata: have you seen the stat1/stat1003 cronspam? there's a bunch of errors there too, like rsync failing [15:46:36] (03CR) 10Dzahn: [C: 04-2] "indeed. virt0 has all the standard NRPE checks now. see https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?search_string=virt0" [operations/puppet] - 10https://gerrit.wikimedia.org/r/107424 (owner: 10Gage) [15:47:09] (03PS1) 10Ottomata: Allowing rsync from stat1001 as well as internal networks on stat1003 [operations/puppet] - 10https://gerrit.wikimedia.org/r/125736 [15:47:22] paravoid: I have not seen the cronspam, but I see that now too [15:47:27] since turning on firewall on friday [15:47:41] rsync was only allowed on internal networks, but stat1001.wikimedia.org needed to get through [15:48:10] paravoid, I had to set upa filter to send cronspam to trash: gmail is too slow for me, and os x mail seems to only hold ~1000 emails per account locally [15:48:29] i was missing emails becasue I only had the last few days worth in my inbox [15:49:23] (03CR) 10Dzahn: "adding ottomata because of the "found analytics data" comment from Sean above" [operations/dns] - 10https://gerrit.wikimedia.org/r/122412 (owner: 10Matanya) [15:50:08] (03PS2) 10Ottomata: Allowing rsync from public stat* servers, as well as internal [operations/puppet] - 10https://gerrit.wikimedia.org/r/125736 [15:50:20] (03CR) 10Alexandros Kosiaris: move LDAP admin permissions,tools out of site.pp (033 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/125721 (owner: 10Dzahn) [15:50:52] ottomata: also a bunch of ezachte's cronjobs [15:51:10] hm [15:51:17] his probably do rsyncs too [15:51:30] (03CR) 10Ottomata: [C: 032 V: 032] Allowing rsync from public stat* servers, as well as internal [operations/puppet] - 10https://gerrit.wikimedia.org/r/125736 (owner: 10Ottomata) [15:51:35] did somebody stop http on formey on purpose or did it just happen [15:51:45] and .. do we still need it [15:51:55] mutante: I upgraded it [15:52:13] * Starting web server apache2 Syntax error on line 56 of /etc/apache2/sites-enabled/000-svn: [15:52:20] SSLCertificateFile: file '/etc/ssl/certs/svn.wikimedia.org.pem' does not exist or is empty [15:53:12] but svn is antimony [15:53:17] so it can't be so bad [15:54:03] we can just remove the monitoring, there should just be the LDAP stuff above left [15:55:26] if that's the case, we probably need to remove all this stale apache config as well :) [15:56:50] ottomata: did you add openjdk to reprepro? [15:57:15] (03CR) 10Dzahn: move LDAP admin permissions,tools out of site.pp (033 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/125721 (owner: 10Dzahn) [15:58:07] (03PS1) 10Ottomata: Can't use domain names because ip6tables gets upset [operations/puppet] - 10https://gerrit.wikimedia.org/r/125739 [15:58:08] paravoid: _afaik_ formey just has the "LDAP operations tools" role left, or it would be shut down already. the DNS entry for SVN does not point to it anymore.. so.. yes, we do [15:58:11] paravoid, yes [15:58:19] ottomata: why? [15:58:19] manybubbles needed that specific version for elasticsearch [15:58:27] seems older than what Ubuntu has [15:58:39] a bunch of hosts want to downgrade openjdk now [15:58:44] oh crap beacuse we pin it [15:58:46] hm [15:59:09] sorry about that! _51 is broken for modern versions of lucene. [15:59:12] like crashes broken [15:59:15] can we make an apt change to prefer ubuntu's openjdk specifically? [15:59:33] elasticsearch puppet is manually specifying that version [15:59:42] so it wont' hurt elsewhere as long as the default is ubuntu's [16:00:09] manybubbles: is this reported anywhere? [16:00:24] RECOVERY - Slow CirrusSearch query rate on fluorine is OK: CirrusSearch-slow.log_line_rate OKAY: 0.0 [16:00:32] (03CR) 10Ottomata: [C: 032 V: 032] Can't use domain names because ip6tables gets upset [operations/puppet] - 10https://gerrit.wikimedia.org/r/125739 (owner: 10Ottomata) [16:00:41] upstream I mean [16:01:03] paravoid: https://issues.apache.org/jira/browse/LUCENE-5212 and https://bugs.openjdk.java.net/browse/JDK-8024830 [16:01:58] paravoid: http://wiki.apache.org/lucene-java/JavaBugs has a list of them [16:04:04] wow [16:04:08] impressive [16:04:12] (the bug) [16:04:44] (03CR) 10Dzahn: "robla,sumanah,reedy,chad, do you actually use these permissions nowadays? trying to move LDAP admins into a class instead of directly on f" [operations/puppet] - 10https://gerrit.wikimedia.org/r/125721 (owner: 10Dzahn) [16:05:30] "I think it may be related to register masks we use on AVX machine" oh, those are words I know [16:05:47] yeah, it gets deep into hotspot [16:07:29] debian/ubuntu track icedtea [16:08:27] (03PS5) 10Dzahn: move LDAP admin permissions,tools out of site.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/125721 [16:08:35] ...and I have no idea how one can ask icedtea to backport this [16:08:37] manybubbles: shall I start repartitioning elastic1006? [16:08:44] ottomata: sure! [16:09:05] paravoid: they'll get it when they go to 1.8 I imagine [16:09:14] http://blog.fuseyism.com/index.php/2014/03/28/icedtea-2-3-14-2-4-6-considered-armful-released/ [16:09:19] they definitely seem like backporting a bunch of stuff [16:09:24] ok, moving shards off of 1006 [16:09:36] ETOOMANYBUGTRACKERS [16:09:59] icedtea.classpath.org/bugzilla/, bugs.sun.com, bugs.openjdk.java.net [16:10:05] I'm getting all confused [16:10:05] (03PS1) 10Dzahn: remove subversion role from formey [operations/puppet] - 10https://gerrit.wikimedia.org/r/125741 [16:11:20] (03PS2) 10Dzahn: remove subversion role from formey [operations/puppet] - 10https://gerrit.wikimedia.org/r/125741 [16:11:24] paravoid: too many. I'm not actually sure if Elasticsearch has a bug for this. Directions not to use the version of java and pointers to the lucene bug, sure.... [16:12:08] did this actually crash for us? [16:13:34] RECOVERY - HTTP on formey is OK: HTTP OK: HTTP/1.1 302 Found - 453 bytes in 0.072 second response time [16:13:40] (03PS1) 10Ottomata: Adding ensure parameter to varnish::logging [operations/puppet] - 10https://gerrit.wikimedia.org/r/125742 [16:13:41] paravoid: not for us, no. I've never tried running that version because of the bug reports [16:13:51] !log deleted old svn apache config on formey, started apache [16:13:57] Logged the message, Master [16:14:28] I didn't feel it was worth trying. we certainly use that code it mentions in the bug report [16:15:19] (03CR) 10Dzahn: [C: 032] remove subversion role from formey [operations/puppet] - 10https://gerrit.wikimedia.org/r/125741 (owner: 10Dzahn) [16:18:36] (03PS1) 10Ottomata: Setting up varnishncsa instance for erbium [operations/puppet] - 10https://gerrit.wikimedia.org/r/125743 [16:19:39] (03CR) 10jenkins-bot: [V: 04-1] Setting up varnishncsa instance for erbium [operations/puppet] - 10https://gerrit.wikimedia.org/r/125743 (owner: 10Ottomata) [16:20:26] (03PS1) 10Ottomata: Changing erbium's udp2log instance to use unicast on port 8419 [operations/puppet] - 10https://gerrit.wikimedia.org/r/125744 [16:21:25] (03CR) 10jenkins-bot: [V: 04-1] Changing erbium's udp2log instance to use unicast on port 8419 [operations/puppet] - 10https://gerrit.wikimedia.org/r/125744 (owner: 10Ottomata) [16:21:33] brb, lunch [16:29:34] PROBLEM - Host cp4016 is DOWN: PING CRITICAL - Packet loss = 100% [16:34:47] interesting... [16:34:56] bblack: is that you? [16:36:12] oh fuck [16:36:40] mutante: did you completely decom brewster? [16:36:41] (03CR) 10Chad: [C: 04-1] move LDAP admin permissions,tools out of site.pp (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/125721 (owner: 10Dzahn) [16:36:45] can we ressurect it from the dead? [16:37:03] there isnt remote wiping really [16:37:07] so i imagine we should be able to do so [16:37:38] (and the drac isnt remotely reset, thats done post wipe, so it should have its remote interface still accessible) [16:38:04] https://gerrit.wikimedia.org/r/#/c/125695/ [16:38:14] https://gerrit.wikimedia.org/r/123626 [16:38:25] (03PS1) 10Reedy: Revert "decom : brewster" [operations/puppet] - 10https://gerrit.wikimedia.org/r/125748 [16:38:34] (03PS1) 10Reedy: Revert "decom: remove brewster incl. mgmt" [operations/dns] - 10https://gerrit.wikimedia.org/r/125749 [16:38:42] thx Reedy =] [16:39:31] PROBLEM - Host cp4018 is DOWN: PING CRITICAL - Packet loss = 100% [16:40:00] paravoid: i can start it [16:40:03] we're about to lose all of ulsfo [16:40:06] among others [16:40:10] oh crap [16:40:41] connects to mgmt [16:40:45] !log powering up brewster [16:40:50] Logged the message, Master [16:40:54] <_joe|away> paravoid: why so? [16:40:57] ok, currently in use, you beat e [16:41:04] @ connect com2 [16:41:11] because the ulsfo boxes (among others) are messed up [16:41:22] auto eth0 [16:41:22] iface eth0 inet dhcp [16:41:26] root@cp4016:~# ip r ls [16:41:30] root@cp4016:~# [16:41:33] Apr 14 13:09:04 cp4016 dhclient: DHCPREQUEST of 10.128.0.116 on eth0 to 208.80.152.171 port 67 [16:41:36] Apr 14 13:10:01 dhclient: last message repeated 3 times [16:41:39] Apr 14 13:10:08 cp4016 dhclient: DHCPREQUEST of 10.128.0.116 on eth0 to 208.80.152.171 port 67 [16:41:42] Apr 14 13:11:01 dhclient: last message repeated 4 times [16:42:23] (03CR) 10Faidon Liambotis: [C: 032] Revert "decom: remove brewster incl. mgmt" [operations/dns] - 10https://gerrit.wikimedia.org/r/125749 (owner: 10Reedy) [16:43:28] (03PS2) 10Dzahn: Revert "decom : brewster" [operations/puppet] - 10https://gerrit.wikimedia.org/r/125748 (owner: 10Reedy) [16:43:30] mutante: what's going on on the console? [16:44:05] paravoid: com2 is currently in use, i expected that to be you because you already loggeed starting it [16:44:16] reset it [16:44:17] if not, i'll reset drac [16:44:22] I wasn't, I just reset drac [16:44:44] yep, i see it restarting [16:44:56] drac itself that is [16:45:01] PROBLEM - Host carbon is DOWN: CRITICAL - Host Unreachable (208.80.154.10) [16:45:13] carbon is one of the others that had inet dhcp... [16:45:15] <_joe|away> mutante: or, if we have an IPMI password set [16:45:27] <_joe|away> you can just power-cycle it [16:45:41] !log powering brewster back on [16:45:45] Logged the message, Master [16:46:19] it's coming back, i watch it [16:46:44] SATAaPortaAdhardkdiskodriverfailure.ror [16:47:04] hits F1, remembering old tickets about broken disk in brewster [16:47:35] (03PS2) 1001tonythomas: Update all Bugzilla custom files which have only trivial changes to use MPL 2.0 [wikimedia/bugzilla/modifications] - 10https://gerrit.wikimedia.org/r/119726 [16:47:40] it tries DHCP now [16:47:43] _joe|away, paravoid: graphite is a lot happier now with https://gerrit.wikimedia.org/r/#/c/125686/ btw [16:47:46] :/ [16:47:54] brewster tries dhcp ? [16:47:59] for the love of god [16:48:01] RECOVERY - Disk space on virt1000 is OK: DISK OK [16:48:41] RECOVERY - Host cp4016 is UP: PING OK - Packet loss = 0%, RTA = 74.27 ms [16:48:59] I assume this is because brewster is up ? [16:49:05] (03CR) 1001tonythomas: "@Dzhan : looks good now ?" [wikimedia/bugzilla/modifications] - 10https://gerrit.wikimedia.org/r/119726 (owner: 1001tonythomas) [16:49:07] no, this is me [16:49:12] preparing plan b [16:49:13] damn [16:49:35] <_joe_> ori: yep :) [16:49:47] yes, plan b please, disk broken!? arg [16:49:53] SATAaPortaAdhardkdiskodriverfailure.ror [16:50:02] sorry for the ping, didn't realize there was a minor outage [16:50:33] <_joe_> ori: not that minor, it seems [16:50:52] <_joe_> paravoid: can't we set up a dhcp server somewhere else? [16:52:26] hooft [16:52:58] carbon is also ready [16:53:11] PROBLEM - Host lvs4003 is DOWN: PING CRITICAL - Packet loss = 100% [16:53:11] i am fixing /etc/network/interfaces manually right now [16:53:15] dammit [16:53:20] SATA Port A device not found. [16:53:20] SATA Port C device not found. [16:53:23] I'll killall -9 dhclient3 [16:53:35] good idea [16:53:41] <_joe_> paravoid: that could work for now [16:54:44] (03PS2) 10Ottomata: Setting up varnishncsa instance for erbium [operations/puppet] - 10https://gerrit.wikimedia.org/r/125743 [16:55:42] manybubbles: shards are off of elastic1006 [16:55:46] reinstalling... [16:55:50] ottomata1: sweet! [16:57:04] carbon.wikimedia.org gadolinium.wikimedia.org hafnium.wikimedia.org labsdb1001.eqiad.wmnet labsdb1002.eqiad.wmnet labsdb1003.eqiad.wmnet labstore1001.eqiad.wmnet searchidx1001.eqiad.wmnet ssl1005.wikimedia.org ssl1006.wikimedia.org ssl1009.wikimedia.org virt1001.eqiad.wmnet ytterbium.wikimedia.org [16:57:07] !log shutting down elastic1006 for reinstall [16:57:09] is the rest of the list [16:57:14] Logged the message, Master [16:57:17] fffs [16:57:26] ok, let's restore lvs4003 now [16:57:32] and the cp400Xs ? [16:57:39] and the cps that died [16:58:23] akosiaris: how's carbon? [16:59:01] PROBLEM - Host elastic1006 is DOWN: PING CRITICAL - Packet loss = 100% [16:59:31] RECOVERY - Host carbon is UP: PING OK - Packet loss = 0%, RTA = 0.83 ms [16:59:42] paravoid: I think that answer the question [17:00:31] PROBLEM - Slow CirrusSearch query rate on fluorine is CRITICAL: CirrusSearch-slow.log_line_rate CRITICAL: 0.0588888888889 [17:02:57] mutante: I suppose brewster wont come up ? [17:03:15] (03PS4) 10Andrew Bogott: Enable a labs site override option for nova config [operations/puppet] - 10https://gerrit.wikimedia.org/r/102185 (owner: 10Ryan Lane) [17:03:22] crisis averted for now [17:03:26] let's just fix the issue now though [17:03:49] akosiaris: no, i tried several times and went to BIOS etc [17:03:58] had we switched over the dhcp relays to carbon ? [17:04:05] nope [17:04:08] damn [17:04:08] SATA controller sometimes sees disk in BIOS , sometimes it doesnt [17:04:11] RECOVERY - Host elastic1006 is UP: PING OK - Packet loss = 0%, RTA = 0.23 ms [17:04:12] should I check? [17:04:16] cant get it to boot [17:04:18] I should have thought it [17:04:27] it wouldn't matter much anyway [17:04:44] we'd lose carbon, then the same turn of events would happen [17:06:07] i'll move cr1/2-ulsfo to carbon now [17:06:11] PROBLEM - Disk space on elastic1006 is CRITICAL: Connection refused by host [17:06:16] already halfway through that [17:06:21] PROBLEM - RAID on elastic1006 is CRITICAL: Connection refused by host [17:06:22] ok [17:06:26] * mark backs out [17:06:31] heh, sorry [17:06:31] PROBLEM - DPKG on elastic1006 is CRITICAL: Connection refused by host [17:06:31] PROBLEM - check configured eth on elastic1006 is CRITICAL: Connection refused by host [17:06:41] PROBLEM - puppet disabled on elastic1006 is CRITICAL: Connection refused by host [17:06:41] PROBLEM - ElasticSearch health check on elastic1006 is CRITICAL: CRITICAL - Could not connect to server 10.64.0.113 [17:06:41] PROBLEM - SSH on elastic1006 is CRITICAL: Connection refused [17:07:03] so a) we need to find out why, b) fix all these machines [17:07:04] (03PS5) 10Andrew Bogott: Enable a labs site override option for nova config [operations/puppet] - 10https://gerrit.wikimedia.org/r/102185 (owner: 10Ryan Lane) [17:07:40] I suppose cp400X are easy to figure out, we had that subnets problem in dhcpd configs [17:07:40] broken netboot config at time of install? [17:07:41] RECOVERY - Host cp4018 is UP: PING OK - Packet loss = 0%, RTA = 75.14 ms [17:07:44] that's what happened for ulsfo [17:07:46] yes [17:07:49] carbon, same story [17:07:55] it was the very first box that got installed in eqiad [17:07:55] ? [17:08:00] nope [17:08:06] i just reinstalled recently [17:08:08] unless it was reinstalled? [17:08:08] ok [17:08:25] I installed the swift boxes right after and that worked fine though [17:08:31] RECOVERY - Host lvs4003 is UP: PING OK - Packet loss = 0%, RTA = 74.13 ms [17:08:51] (03CR) 10Andrew Bogott: [C: 032] Enable a labs site override option for nova config [operations/puppet] - 10https://gerrit.wikimedia.org/r/102185 (owner: 10Ryan Lane) [17:09:25] and I had installed the labsdb100{4,5} before with no problem [17:09:41] so something in the configs of those subnets ? [17:10:35] so i give up on starting brewser, if anyone wants to look at the failed disk, but i just get slightly different kinds of SATA fail [17:11:09] mutante: just kill it again :) [17:11:50] paravoid: ok [17:11:57] perhaps just a dns resolving issue? [17:11:58] although [17:12:12] then the static node statement in the dhcpd.conf shouldn't work either [17:14:46] (03PS2) 10Andrew Bogott: Use eth0 IP rather than localhost for multi-region [operations/puppet] - 10https://gerrit.wikimedia.org/r/102345 (owner: 10Ryan Lane) [17:14:47] !log brewster - power down, could not revive due to disk or SATA controller fail [17:14:52] Logged the message, Master [17:16:09] ottomata: hey, I've got to get out of the house soon - I'm about to go crazy. I'm going to grab some lunch and go to a coffee shop or something. how it elastic1005? [17:17:07] 1006? its installing [17:17:08] looking fine [17:17:10] i'm sure i'm good [17:17:11] go ahead [17:17:15] kinda got this down by now :) [17:17:18] (03PS1) 10Dzahn: Revert "Revert "decom: remove brewster incl. mgmt"" [operations/dns] - 10https://gerrit.wikimedia.org/r/125753 [17:17:34] ottomata: cool. I'll be online soon [17:18:21] PROBLEM - NTP on elastic1006 is CRITICAL: NTP CRITICAL: No response from NTP server [17:18:24] (03PS3) 10Andrew Bogott: Use eth0 IP rather than localhost for multi-region [operations/puppet] - 10https://gerrit.wikimedia.org/r/102345 (owner: 10Ryan Lane) [17:23:11] RobH: mind if I use WMF3409 temporarily to try and reproduce the DHCP installation issue ? [17:23:35] i see it is a spare on eqiad row A, Rack A4 [17:24:41] RECOVERY - SSH on elastic1006 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.3 (protocol 2.0) [17:28:04] akosiaris: go for it [17:28:13] want the element name? [17:28:21] nope [17:28:25] I just created my own [17:28:29] alexydymium :P [17:28:42] oh here we go [17:28:42] ensuring its a temp host all right ;] [17:28:52] no really now, I will put deleteme [17:28:54] (03PS4) 10Andrew Bogott: Use eth0 IP rather than localhost for multi-region [operations/puppet] - 10https://gerrit.wikimedia.org/r/102345 (owner: 10Ryan Lane) [17:28:57] I forgot to tell you when we were hiring akosiaris [17:29:04] makes much more sense [17:29:16] don't. ever. let. him. name stuff [17:29:24] akosiaris: well, its a spare [17:29:27] so if you use the real element [17:29:30] you can NOT revoke its dns [17:29:31] and leave it such [17:29:41] RECOVERY - ElasticSearch health check on elastic1006 is OK: OK - elasticsearch (production-search-eqiad) is running. status: green: timed_out: false: number_of_nodes: 16: number_of_data_nodes: 16: active_primary_shards: 1896: active_shards: 5611: relocating_shards: 0: initializing_shards: 0: unassigned_shards: 0 [17:29:54] Osmium [17:29:59] (03CR) 10jenkins-bot: [V: 04-1] Use eth0 IP rather than localhost for multi-region [operations/puppet] - 10https://gerrit.wikimedia.org/r/102345 (owner: 10Ryan Lane) [17:30:06] paravoid: i feel that way about everyone though =] [17:30:26] even me, i named one eiximenis =] [17:30:40] (i loved that server name) [17:30:59] (03CR) 10Dzahn: [C: 04-1] "tried to revive brewster but SATAaPortaAdhardkdiskodriverfailure.ror" [operations/puppet] - 10https://gerrit.wikimedia.org/r/125748 (owner: 10Reedy) [17:31:20] (from Greek osme (ὀσμή) meaning "smell") [17:31:29] Osmium ^ [17:31:37] IT WAS MEANT TO BE THAT NAME ;] [17:31:52] * RobH really doesn't care the temp name is fine, he is jokin. [17:32:01] (03PS1) 10Alexandros Kosiaris: Introduce machine deleteme [operations/dns] - 10https://gerrit.wikimedia.org/r/125754 [17:32:02] * akosiaris already knows :-) [17:32:09] hence the above commit [17:32:16] <_joe_> lol [17:32:31] next temp machine should be 'ch4ng3m3' [17:33:07] aka: previous install password at one of my old jobs (which has since been changed to more sane password hash push by me at later date) [17:33:15] s/has/had/ [17:33:17] WARN: jenkins detected common password in commit message [17:33:34] (03PS5) 10Andrew Bogott: Use eth0 IP rather than localhost for multi-region [operations/puppet] - 10https://gerrit.wikimedia.org/r/102345 (owner: 10Ryan Lane) [17:33:44] then again that place also used to but in a root shell terminal hack [17:33:47] i hated that. [17:34:05] <_joe_> ok gotta bail -see you tomorrow [17:34:11] any managed server could have a crash cart hooked up and the term7 was always logged in as root [17:34:18] (so hacky and wrong) [17:35:11] RECOVERY - Disk space on elastic1006 is OK: DISK OK [17:35:21] RECOVERY - RAID on elastic1006 is OK: OK: Active: 6, Working: 6, Failed: 0, Spare: 0 [17:35:31] RECOVERY - DPKG on elastic1006 is OK: All packages OK [17:35:31] RECOVERY - check configured eth on elastic1006 is OK: NRPE: Unable to read output [17:35:41] RECOVERY - puppet disabled on elastic1006 is OK: OK [17:35:56] (03PS6) 10Andrew Bogott: Use eth0 IP rather than localhost for multi-region [operations/puppet] - 10https://gerrit.wikimedia.org/r/102345 (owner: 10Ryan Lane) [17:37:14] !log fixing /e/n/interfaces for static configuration for cp40xx, lvs40xx [17:37:19] Logged the message, Master [17:37:39] (03CR) 10Andrew Bogott: [C: 032] "well /that/ was a difficult rebase." [operations/puppet] - 10https://gerrit.wikimedia.org/r/102345 (owner: 10Ryan Lane) [17:39:58] (03PS6) 10Dzahn: move LDAP admin permissions,tools out of site.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/125721 [17:40:56] (03PS1) 10Alexandros Kosiaris: Adding host deleteme.wikimedia.org [operations/puppet] - 10https://gerrit.wikimedia.org/r/125755 [17:40:58] (03CR) 10jenkins-bot: [V: 04-1] Adding host deleteme.wikimedia.org [operations/puppet] - 10https://gerrit.wikimedia.org/r/125755 (owner: 10Alexandros Kosiaris) [17:41:07] (03CR) 10Andrew Bogott: [C: 032] pep8 fixes [operations/debs/adminbot] - 10https://gerrit.wikimedia.org/r/117691 (owner: 10Merlijn van Deen) [17:42:14] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] Adding host deleteme.wikimedia.org [operations/puppet] - 10https://gerrit.wikimedia.org/r/125755 (owner: 10Alexandros Kosiaris) [17:42:36] (03CR) 10Andrew Bogott: [C: 032] Simplify boolean return [operations/debs/adminbot] - 10https://gerrit.wikimedia.org/r/121973 (owner: 10Reedy) [17:42:54] (03PS4) 10Andrew Bogott: webserver: fixing duplicate declaration of apache-mpm [operations/puppet] - 10https://gerrit.wikimedia.org/r/112423 (owner: 10Matanya) [17:44:00] (03CR) 10Andrew Bogott: [C: 04-1] webserver: fixing duplicate declaration of apache-mpm (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/112423 (owner: 10Matanya) [17:44:51] (03CR) 10Alexandros Kosiaris: [C: 032] Introduce machine deleteme [operations/dns] - 10https://gerrit.wikimedia.org/r/125754 (owner: 10Alexandros Kosiaris) [17:45:35] (03CR) 10Andrew Bogott: "This was already changed via a different patch, right?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/106907 (owner: 10Stwalkerster) [17:46:16] (03PS7) 10Dzahn: move LDAP admin permissions,tools out of site.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/125721 [17:46:26] (03PS5) 10Andrew Bogott: torrus: move into a module [operations/puppet] - 10https://gerrit.wikimedia.org/r/108498 (owner: 10Matanya) [17:46:58] (03CR) 10Chad: [C: 031] move LDAP admin permissions,tools out of site.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/125721 (owner: 10Dzahn) [17:47:46] !log fixing /e/n/interfaces for static configuration: gadolinium hafnium labsdb1001 labsdb1002 labsdb1003 labstore1001 searchidx1001 ssl1005 ssl1006 ssl1009 virt1001 ytterbium [17:47:49] Logged the message, Master [17:48:05] ok [17:48:07] I have to go now [17:48:21] RECOVERY - NTP on elastic1006 is OK: NTP OK: Offset -0.01428997517 secs [17:48:35] can someone verify that /e/n/i on the boxes above (the list + cp40xx + lvs40xx) is sane? [17:49:19] akosiaris maybe? :) [17:49:34] I was already looking at copper and tantalum [17:49:39] but seems like my check was bad [17:49:42] and these are ok [17:50:01] I 'll have a look at the rest [17:56:39] (03PS11) 10BryanDavis: [WIP] Configure scap master and clients in beta [operations/puppet] - 10https://gerrit.wikimedia.org/r/123674 [17:57:39] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "Some variable assignments and one define are missing from site.pp, or am I missing something?" (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/125721 (owner: 10Dzahn) [17:59:04] (03CR) 10Dzahn: "i removed them after the comments on PS4" [operations/puppet] - 10https://gerrit.wikimedia.org/r/125721 (owner: 10Dzahn) [18:00:20] (03CR) 10Dzahn: move LDAP admin permissions,tools out of site.pp (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/125721 (owner: 10Dzahn) [18:00:31] RECOVERY - Slow CirrusSearch query rate on fluorine is OK: CirrusSearch-slow.log_line_rate OKAY: 0.0 [18:03:36] <_joe|away> mutante: strange, I cannot see the comments on that commit [18:04:43] <^d> _joe|away: https://gerrit.wikimedia.org/r/#/c/125721/4..7/manifests/admins.pp [18:04:50] <^d> That shows 'em [18:05:17] johnduhart: https://gerrit.wikimedia.org/r/#/c/125721/4/manifests/role/ldap.pp [18:05:22] <_joe|away> yes got them mutante [18:05:26] sorry, wrong nick [18:05:37] <_joe|away> sorry the gerrit interface defies me sometimes :( [18:05:52] <_joe|away> I suppose I'll get used to it. [18:06:13] ottomata1: back! [18:06:23] and fwiw, the question of renaming labs users later [18:06:37] https://wikitech.wikimedia.org/wiki/Renaming_users [18:06:44] ottomata1: every time you repartition the machines gmond needs restarting [18:06:47] i'll do that with my own user later [18:06:47] manybubbles: cool! [18:06:50] its like it comes up 80% busted [18:06:52] oh, hm [18:06:55] weird ok [18:07:01] so i also have full name [18:07:02] hey i forgot you wanted to do something to this node before I turned it back on [18:07:06] its moving shards back to it now [18:07:06] (03CR) 10Giuseppe Lavagetto: [C: 031] move LDAP admin permissions,tools out of site.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/125721 (owner: 10Dzahn) [18:07:17] ottomata: its cool - I'll get the next one [18:42:38] * jdlrobson looks around [18:43:00] is ariel here? [18:44:36] <^d> jdlrobson: Maybe maybe not, apergo s says idle for ~2h [18:44:36] apergos: ^^ [18:44:43] Also !ask [18:44:45] jdlrobson: its 21:44 there [18:44:45] But whatever [18:44:50] so doubtful [18:44:53] Yeh I suspected so [18:44:57] she mailed me so i thought i'd try [18:45:03] having been in that time zone all last week [18:45:13] i have new found pity when they stay up late workig with PST folks like me [18:45:17] ok it's nothing urgent. Maybe I can catch her this afternoon [18:45:39] RobH: yehhh I occasionally work in London on SF hours - that can be painful. [18:45:43] i also think that faidon may actually be at minimum identical twins who are portraying themselves as a single person [18:45:50] due to his never seeming to sleep [18:46:14] though if it turned out to be identical triplets so one is always actively working on an issue at all hours i wouldnt be exactly shocked. [18:46:51] PROBLEM - MySQL Idle Transactions on db1047 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [18:46:56] yea, when i was on east coast time it was in that sweet spot of being able to shift to really work with anyone easily for the majority of our covered timezones. [18:48:51] RECOVERY - MySQL Idle Transactions on db1047 is OK: OK longest blocking idle transaction sleeps for 0 seconds [19:00:31] PROBLEM - Slow CirrusSearch query rate on fluorine is CRITICAL: CirrusSearch-slow.log_line_rate CRITICAL: 0.000555555555556 [19:05:51] (03PS1) 10Jforrester: Enable VisualEditor for opt-in on Meta [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/125774 [20:02:46] :O [20:08:17] (03PS1) 10Ottomata: Adding zookeeper::jmxtrans class [operations/puppet/zookeeper] - 10https://gerrit.wikimedia.org/r/125856 [20:11:48] (03CR) 10Ottomata: [C: 032 V: 032] Adding zookeeper::jmxtrans class [operations/puppet/zookeeper] - 10https://gerrit.wikimedia.org/r/125856 (owner: 10Ottomata) [20:13:18] !log deployed Parsoid fba548cbf (deploy repo sha d0e12ddf) [20:13:23] Logged the message, Master [20:16:21] (03PS1) 10Ottomata: Sending Zookeeper JVM stats to Ganglia [operations/puppet] - 10https://gerrit.wikimedia.org/r/125865 [20:18:04] (03PS2) 10Ottomata: Sending Zookeeper JVM stats to Ganglia [operations/puppet] - 10https://gerrit.wikimedia.org/r/125865 [20:18:11] hey subbu, [20:18:26] parsoid uses git submodules and trebuchet/git-deploy to deploy, right? [20:18:27] hi ottomata2 [20:18:29] yes [20:18:36] have you had trouble with that? [20:18:38] specificically [20:18:51] https://github.com/trebuchet-deploy/trigger/issues/27 [20:18:54] not recently .. but early on gwicke had issues with it. [20:20:32] ottomata, gwicke and i are currently verifying our deploy .. so, will get back after that. [20:23:13] ottomata, I think some of the bugs we ran into are now fixed, but I'd still expect some issues with a new repo [20:23:56] yeah, my repo is relatively new [20:23:57] like last week [20:24:26] I can sometimes make it work by cding into the target's .git/modules/ dir and running git update-server-info after an unsuccessful deploy [20:24:30] but other times not :( [20:24:37] are you planning to deploy a service that needs restarts? [20:25:20] asking because we currently can't use salt for that [20:32:53] (03CR) 10Ottomata: [C: 032 V: 032] Sending Zookeeper JVM stats to Ganglia [operations/puppet] - 10https://gerrit.wikimedia.org/r/125865 (owner: 10Ottomata) [20:32:56] gwicke: no [20:33:21] just a repository with some git-fat artifacts and a single submodule (right now) [20:33:41] https://github.com/wikimedia/analytics-kraken-deploy [20:33:50] k [20:35:47] (03PS1) 10Ottomata: Fixing zookeeper::jmxtrans class name [operations/puppet/zookeeper] - 10https://gerrit.wikimedia.org/r/125870 [20:35:58] (03CR) 10Ottomata: [C: 032 V: 032] Fixing zookeeper::jmxtrans class name [operations/puppet/zookeeper] - 10https://gerrit.wikimedia.org/r/125870 (owner: 10Ottomata) [20:36:18] (03PS1) 10Ottomata: Updating zookeeper submodule with proper class name [operations/puppet] - 10https://gerrit.wikimedia.org/r/125871 [20:36:28] (03CR) 10Ottomata: [C: 032 V: 032] Updating zookeeper submodule with proper class name [operations/puppet] - 10https://gerrit.wikimedia.org/r/125871 (owner: 10Ottomata) [20:39:06] (03PS2) 10Ottomata: Changing erbium's udp2log instance to use unicast on port 8419 [operations/puppet] - 10https://gerrit.wikimedia.org/r/125744 [20:48:22] ottomata: do you plan to do another node today? [20:48:27] I haven't been paying attention [20:48:40] naw, not today, will continue tomorrow [20:48:59] cool - ping me and I'll do my customization thingy [20:49:12] k [20:49:14] thanks for doing these! [20:53:45] yup! [21:00:08] andrewbogott: hai [21:17:28] bblack, hi, any luck with automating wget through api authentication? [21:21:24] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "Did not realize that urllib3 is not packaged in Precise. I will probably revert to urllib2, which is a pity." [operations/puppet] - 10https://gerrit.wikimedia.org/r/125726 (owner: 10Giuseppe Lavagetto) [21:23:51] (03CR) 10Ottomata: "It probably wouldn't be hard to package it for Precise!" [operations/puppet] - 10https://gerrit.wikimedia.org/r/125726 (owner: 10Giuseppe Lavagetto) [21:28:13] AzaToth: hi! Struggling with jetlag, sorry, nodded off for a bit :) [21:28:16] what's up? [21:31:10] andrewbogott: +1 i feel exhausted right now. [21:31:25] but its just past midnight now for athens [21:31:43] this is when i would ditch the group and go to bed at the hotel (while they continued to drink) [21:31:45] I stayed awake all day yesterday, thought I had this beat [21:32:00] I stayed awake till 9:30 or so but still woke up during night a few times [21:32:19] hopefully by tomorrow im back to normal schedule, im not letting myself take a nap [21:32:29] but im now also kind of useless for getting shit done, heh. [21:33:26] * RobH won't be pushing gerrit changes today. [21:33:56] I got a lot done before 8 AM today. The rest of the day has been pretty slack [21:36:51] (03PS1) 10Reedy: Upgrade to jquery 2.0.2 and jquery-ui 1.10.4 [operations/software] - 10https://gerrit.wikimedia.org/r/125883 [21:40:18] (03PS5) 10Andrew Bogott: openstack: generic::upstart_job() now uses boolean values [operations/puppet] - 10https://gerrit.wikimedia.org/r/118716 (owner: 10Hashar) [21:41:07] (03PS2) 10Reedy: Upgrade to jquery 2.0.2 and jquery-ui 1.10.4 [operations/software] - 10https://gerrit.wikimedia.org/r/125883 [21:42:42] (03CR) 10Andrew Bogott: [C: 032] openstack: generic::upstart_job() now uses boolean values [operations/puppet] - 10https://gerrit.wikimedia.org/r/118716 (owner: 10Hashar) [21:54:57] (03PS1) 10BryanDavis: beta: New script to restart apaches [operations/puppet] - 10https://gerrit.wikimedia.org/r/125888 [22:05:41] andrewbogott: you wondered about my comment on https://gerrit.wikimedia.org/r/#/c/68935/ [22:06:00] yeah -- do you mean that after your patch there is license? [22:06:48] no, the issue is that there is no licence statement at all regarding the files [22:06:58] Hm, ok. [22:07:13] I'm not sure if that means that there's a default fallback. I'll look in to it. [22:07:20] and it's unclear whom the original author is [22:07:58] It was downloaded from http://svn.wikimedia.org/viewvc/mediawiki/trunk/tools/ircecho [22:08:04] Unknown location: /trunk/tools/ircecho [22:08:27] thus I've no idea whom made the script [22:08:40] if it's under any permissible licence [22:08:42] or whatever [22:09:00] and license says: [22:09:04] " Unlicensed for adminlogbot.py and adminlog.py (for now) [22:09:04] GPL3 for statusnet.py [22:09:04] " [22:09:12] Hasn't this been moved somewhere else? I remember adjusting old SVN links when there were still around. [22:10:34] is there a HTTP proxy in eqiad? We used to use brewster to update a git checkout on a test host, but that's now down. [22:12:00] nm, found it at https://wikitech.wikimedia.org/wiki/Http_proxy [22:13:35] (03CR) 10BryanDavis: "Cherry-picked into deployment-salt and applied there:" [operations/puppet] - 10https://gerrit.wikimedia.org/r/125888 (owner: 10BryanDavis) [22:18:07] (03PS2) 10AzaToth: Updating debian package files [operations/debs/adminbot] - 10https://gerrit.wikimedia.org/r/68935 [22:18:14] rebased it [22:22:51] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000 [22:55:04] (03CR) 10Aaron Schulz: [C: 032] Limit large (djvu) file downloads for thumbnails [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/125537 (owner: 10Aaron Schulz) [22:55:18] (03Merged) 10jenkins-bot: Limit large (djvu) file downloads for thumbnails [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/125537 (owner: 10Aaron Schulz) [22:56:20] !log aaron synchronized wmf-config/PoolCounterSettings-eqiad.php 'Limit large (djvu) file downloads for thumbnails' [22:56:25] Logged the message, Master [22:59:41] PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 194.433334 [23:00:21] PROBLEM - Varnishkafka Delivery Errors on cp3020 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 120.900002 [23:01:38] AaronSchulz: Are you done deploying stuff? [23:02:52] yeah [23:03:00] k [23:06:38] RoanKattouw: I can do SWAT, if you like [23:07:17] ori: That would be quite welcome actually [23:07:19] Go for it [23:07:22] I'll be around if you need anything [23:08:12] * ori does. [23:10:04] (03PS2) 10Ori.livneh: Enable VisualEditor on French Wikinews [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/125538 (owner: 10Catrope) [23:10:16] (03CR) 10Ori.livneh: [C: 032] Enable VisualEditor on French Wikinews [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/125538 (owner: 10Catrope) [23:11:42] (03Merged) 10jenkins-bot: Enable VisualEditor on French Wikinews [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/125538 (owner: 10Catrope) [23:12:04] !log ori updated /a/common to {{Gerrit|I59f5a6e0b}}: Enable VisualEditor on French Wikinews [23:12:09] Logged the message, Master [23:12:46] !log ori synchronized visualeditor.dblist 'I59f5a6e0b: Enable VisualEditor on French Wikinews' [23:12:51] Logged the message, Master [23:12:55] (03PS2) 10Ori.livneh: Enable VisualEditor for opt-in on Meta [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/125774 (owner: 10Jforrester) [23:13:04] (03CR) 10Ori.livneh: [C: 032] Enable VisualEditor for opt-in on Meta [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/125774 (owner: 10Jforrester) [23:13:19] (03Merged) 10jenkins-bot: Enable VisualEditor for opt-in on Meta [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/125774 (owner: 10Jforrester) [23:14:00] !log ori updated /a/common to {{Gerrit|I22f25730d}}: Enable VisualEditor for opt-in on Meta [23:14:06] Logged the message, Master [23:15:19] !log ori synchronized visualeditor.dblist 'I22f25730d: Enable VisualEditor for opt-in on Meta (1/2)' [23:15:24] Logged the message, Master [23:15:28] !log ori synchronized wmf-config/InitialiseSettings.php 'I22f25730d: Enable VisualEditor for opt-in on Meta (2/2)' [23:15:34] Logged the message, Master [23:17:10] !log ori synchronized php-1.23wmf22/skins/vector/variables.less 'Ibcdaff017: Revert body font stack to be just sans-serif' [23:17:16] Logged the message, Master [23:19:51] RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [23:20:21] RECOVERY - Varnishkafka Delivery Errors on cp3020 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [23:22:32] ori: Could you add https://gerrit.wikimedia.org/r/125904 to the VE cherry-picks? [23:23:05] RoanKattouw: sure [23:38:37] !log ori Started scap: (no message) [23:38:42] Logged the message, Master [23:39:06] grr, forgot message [23:39:54] !log scap: php-1.23wmf22/extensions/VisualEditor 2b0979f...0652ad2 (I12e5c9751) [23:39:59] Logged the message, Master [23:40:05] Heh [23:40:26] * bd808 has thought about making that a fatal error in scap [23:42:18] bd808: having the log level encoded twice in each line (as a color and by name) is a bit much [23:42:38] i'd omit the name [23:42:52] '23:39:22 DEBUG - Started sync-common to proxies' -> '23:39:22 - Started sync-common to proxies' [23:43:09] !log ori Finished scap: (no message) (duration: 04m 31s) [23:43:14] Logged the message, Master [23:44:57] RoanKattouw: can you confirm that things look OK? i've checked VE on frwikinews and meta, but I don't know how to verify the various fixes you cherry-picked [23:45:08] James_F: ----^^ ? [23:46:09] bd808: also, in the vast majority of cases, free-form message input is obviated by having the commit range calculated automatically [23:46:52] ryasmeen: ^^^ [23:47:09] ryasmeen: From quick testing it looks good to me /cc RoanKattouw ori. [23:47:30] James_F: thanks! [23:47:51] PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 25.733334 [23:48:07] ori: Hmmm... how would scap calculate the commit range? By looking at the diff between HEAD in /usr/local/common and /a/common? [23:48:21] PROBLEM - Varnishkafka Delivery Errors on cp3020 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 154.866669 [23:48:47] bd808: you can snapshot the current commit at the end of each scap [23:50:19] I guess. And then account for all the sync-{file,dir} calls that happen in between in the typical week. Of course that would be easier if scap was the only tool really used. [23:51:33] Or actually make some movement towards a transport mechanism that includes tagging :) [23:52:53] yes James_F, its working for me too