[00:22:40] PROBLEM - Puppet freshness on tungsten is CRITICAL: Last successful Puppet run was Sun 16 Mar 2014 12:19:26 PM UTC [00:23:40] PROBLEM - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Tue 11 Mar 2014 08:47:37 PM UTC [01:23:50] (03PS1) 10Springle: dbstore partman use regular db.cfg [operations/puppet] - 10https://gerrit.wikimedia.org/r/119000 [01:25:25] (03CR) 10Springle: [C: 032] dbstore partman use regular db.cfg [operations/puppet] - 10https://gerrit.wikimedia.org/r/119000 (owner: 10Springle) [01:29:10] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: reqstats.5xx [warn=250.000 [01:50:57] it seems west coast verizon users have been reporting connection troubles over the past 2 weeks. [01:51:00] https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)#server_problems.3F [01:51:49] eh east coast [02:01:09] !log LocalisationUpdate failed: mwversionsinuse returned empty list [02:01:18] Logged the message, Master [02:44:20] (03CR) 10BryanDavis: "> drwxrws--- 6 root wikidev 4096 Mar 10 03:37 scap" [operations/puppet] - 10https://gerrit.wikimedia.org/r/118946 (owner: 10Reedy) [02:49:23] and again [03:22:53] PROBLEM - Puppet freshness on tungsten is CRITICAL: Last successful Puppet run was Sun 16 Mar 2014 12:19:26 PM UTC [03:23:53] PROBLEM - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Tue 11 Mar 2014 08:47:37 PM UTC [05:25:35] (03PS1) 10ArielGlenn: remove WikiDump dependency, pep8-ify, cleanup usage message [operations/dumps] (ariel) - 10https://gerrit.wikimedia.org/r/119003 [05:29:54] (03CR) 10ArielGlenn: [C: 032] remove WikiDump dependency, pep8-ify, cleanup usage message [operations/dumps] (ariel) - 10https://gerrit.wikimedia.org/r/119003 (owner: 10ArielGlenn) [05:52:53] (03PS1) 10ArielGlenn: detabify and whitespace cleanup of html and other templates [operations/dumps] (ariel) - 10https://gerrit.wikimedia.org/r/119004 [05:54:56] (03CR) 10ArielGlenn: [C: 032] detabify and whitespace cleanup of html and other templates [operations/dumps] (ariel) - 10https://gerrit.wikimedia.org/r/119004 (owner: 10ArielGlenn) [06:23:53] PROBLEM - Puppet freshness on tungsten is CRITICAL: Last successful Puppet run was Sun 16 Mar 2014 12:19:26 PM UTC [06:24:53] PROBLEM - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Tue 11 Mar 2014 08:47:37 PM UTC [06:35:00] (03PS1) 10ArielGlenn: dumps templates, configs, dblists added to snapshot module [operations/puppet] - 10https://gerrit.wikimedia.org/r/119006 [06:38:00] (03CR) 10ArielGlenn: [C: 032] dumps templates, configs, dblists added to snapshot module [operations/puppet] - 10https://gerrit.wikimedia.org/r/119006 (owner: 10ArielGlenn) [06:47:50] (03PS1) 10ArielGlenn: eqiad snapshot hosts to use puppetized templates and configs [operations/puppet] - 10https://gerrit.wikimedia.org/r/119007 [06:48:34] (03CR) 10jenkins-bot: [V: 04-1] eqiad snapshot hosts to use puppetized templates and configs [operations/puppet] - 10https://gerrit.wikimedia.org/r/119007 (owner: 10ArielGlenn) [06:49:58] (03PS2) 10ArielGlenn: eqiad snapshot hosts to use puppetized templates and configs [operations/puppet] - 10https://gerrit.wikimedia.org/r/119007 [06:52:35] (03CR) 10ArielGlenn: [C: 032] eqiad snapshot hosts to use puppetized templates and configs [operations/puppet] - 10https://gerrit.wikimedia.org/r/119007 (owner: 10ArielGlenn) [07:11:44] (03PS1) 10ArielGlenn: snapshot central auth dump to use puppetized configs [operations/puppet] - 10https://gerrit.wikimedia.org/r/119008 [07:13:09] (03CR) 10ArielGlenn: [C: 032] snapshot central auth dump to use puppetized configs [operations/puppet] - 10https://gerrit.wikimedia.org/r/119008 (owner: 10ArielGlenn) [07:17:54] (03PS1) 10ArielGlenn: on snapshots fix config file names [operations/puppet] - 10https://gerrit.wikimedia.org/r/119009 [07:19:28] (03CR) 10ArielGlenn: [C: 032] on snapshots fix config file names [operations/puppet] - 10https://gerrit.wikimedia.org/r/119009 (owner: 10ArielGlenn) [07:54:51] (03PS1) 10ArielGlenn: puppetize page titles generation on snapshot hosts [operations/puppet] - 10https://gerrit.wikimedia.org/r/119011 [07:57:30] (03CR) 10ArielGlenn: [C: 032] puppetize page titles generation on snapshot hosts [operations/puppet] - 10https://gerrit.wikimedia.org/r/119011 (owner: 10ArielGlenn) [08:28:26] (03PS1) 10ArielGlenn: puppetize conf file for pagetitles dump on snapshots [operations/puppet] - 10https://gerrit.wikimedia.org/r/119013 [08:31:11] (03PS2) 10ArielGlenn: puppetize conf file for pagetitles dump on snapshots [operations/puppet] - 10https://gerrit.wikimedia.org/r/119013 [08:32:45] (03CR) 10ArielGlenn: [C: 032] puppetize conf file for pagetitles dump on snapshots [operations/puppet] - 10https://gerrit.wikimedia.org/r/119013 (owner: 10ArielGlenn) [08:59:58] (03PS1) 10Ori.livneh: GeoIP cookie: expand deployment from Labs to production [operations/puppet] - 10https://gerrit.wikimedia.org/r/119014 [09:04:03] PROBLEM - SSH on lvs4001 is CRITICAL: Server answer: [09:06:03] RECOVERY - SSH on lvs4001 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [09:24:53] PROBLEM - Puppet freshness on tungsten is CRITICAL: Last successful Puppet run was Sun 16 Mar 2014 12:19:26 PM UTC [09:25:53] PROBLEM - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Tue 11 Mar 2014 08:47:37 PM UTC [09:45:26] (03CR) 10Mark Bergsma: Initial commit of pmacct module and role (036 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/115345 (owner: 10Jkrauska) [10:28:20] (03PS1) 10ArielGlenn: detabify and whitespace fixes for one more dumps template [operations/dumps] (ariel) - 10https://gerrit.wikimedia.org/r/119022 [10:29:12] (03CR) 10ArielGlenn: [C: 032] detabify and whitespace fixes for one more dumps template [operations/dumps] (ariel) - 10https://gerrit.wikimedia.org/r/119022 (owner: 10ArielGlenn) [10:31:07] (03PS1) 10ArielGlenn: detabify, white space fixes for adds-changes template [operations/dumps] (ariel) - 10https://gerrit.wikimedia.org/r/119023 [10:31:48] (03CR) 10ArielGlenn: [C: 032] detabify, white space fixes for adds-changes template [operations/dumps] (ariel) - 10https://gerrit.wikimedia.org/r/119023 (owner: 10ArielGlenn) [10:37:52] (03PS1) 10ArielGlenn: update language about maxrevid content for addschanges dumps [operations/dumps] (ariel) - 10https://gerrit.wikimedia.org/r/119024 [10:40:43] (03CR) 10ArielGlenn: [C: 032] update language about maxrevid content for addschanges dumps [operations/dumps] (ariel) - 10https://gerrit.wikimedia.org/r/119024 (owner: 10ArielGlenn) [10:51:03] PROBLEM - LDAP on virt1000 is CRITICAL: Connection refused [11:01:36] (03PS1) 10ArielGlenn: puppetize adds-changes dumps (so-called incrementals) [operations/puppet] - 10https://gerrit.wikimedia.org/r/119026 [11:04:33] (03CR) 10ArielGlenn: [C: 032] puppetize adds-changes dumps (so-called incrementals) [operations/puppet] - 10https://gerrit.wikimedia.org/r/119026 (owner: 10ArielGlenn) [11:05:13] PROBLEM - LDAPS on virt1000 is CRITICAL: Connection refused [11:06:03] (03PS1) 10Gilles: Expose some headers to CORS requests [operations/puppet] - 10https://gerrit.wikimedia.org/r/119027 [11:11:39] (03PS1) 10ArielGlenn: fix up 'maintained by puppet' notices for couple of dumps templates [operations/puppet] - 10https://gerrit.wikimedia.org/r/119028 [11:13:19] (03CR) 10ArielGlenn: [C: 032] fix up 'maintained by puppet' notices for couple of dumps templates [operations/puppet] - 10https://gerrit.wikimedia.org/r/119028 (owner: 10ArielGlenn) [11:28:50] (03PS2) 10Alexandros Kosiaris: Support spatial_ref_sys and comments in postgres [operations/puppet] - 10https://gerrit.wikimedia.org/r/117920 [11:43:07] (03PS1) 10Alexandros Kosiaris: Creating a spatially enabled db on osmlab dbs [operations/puppet] - 10https://gerrit.wikimedia.org/r/119031 [11:56:15] (03CR) 10Faidon Liambotis: [C: 032] Expose some headers to CORS requests [operations/puppet] - 10https://gerrit.wikimedia.org/r/119027 (owner: 10Gilles) [12:00:27] (03PS1) 10Faidon Liambotis: Remove language subdomains for wikidata.org [operations/dns] - 10https://gerrit.wikimedia.org/r/119032 [12:03:08] (03CR) 10Faidon Liambotis: "Also see I084175ffc2545ca68ddd21d1ecf7760f959f6c6b." [operations/dns] - 10https://gerrit.wikimedia.org/r/119032 (owner: 10Faidon Liambotis) [12:03:57] (03CR) 10Faidon Liambotis: "Also see I313d8643ccdd5c3790a87883a4f8ec258f91c103." [operations/apache-config] - 10https://gerrit.wikimedia.org/r/113972 (owner: 10Thiemo Mättig (WMDE)) [12:07:08] (03PS1) 10ArielGlenn: puppetize generation of lists of good dumps for rsyncers [operations/puppet] - 10https://gerrit.wikimedia.org/r/119033 [12:08:24] (03CR) 10jenkins-bot: [V: 04-1] puppetize generation of lists of good dumps for rsyncers [operations/puppet] - 10https://gerrit.wikimedia.org/r/119033 (owner: 10ArielGlenn) [12:11:10] (03PS2) 10ArielGlenn: puppetize generation of lists of good dumps for rsyncers [operations/puppet] - 10https://gerrit.wikimedia.org/r/119033 [12:13:07] (03PS1) 10Faidon Liambotis: autoinstall: add ms-be3004's second NIC to dhcpd [operations/puppet] - 10https://gerrit.wikimedia.org/r/119035 [12:13:29] (03CR) 10Faidon Liambotis: [C: 032] autoinstall: add ms-be3004's second NIC to dhcpd [operations/puppet] - 10https://gerrit.wikimedia.org/r/119035 (owner: 10Faidon Liambotis) [12:14:08] (03CR) 10Faidon Liambotis: [V: 032] autoinstall: add ms-be3004's second NIC to dhcpd [operations/puppet] - 10https://gerrit.wikimedia.org/r/119035 (owner: 10Faidon Liambotis) [12:14:24] (03PS3) 10ArielGlenn: puppetize generation of lists of good dumps for rsyncers [operations/puppet] - 10https://gerrit.wikimedia.org/r/119033 [12:15:58] (03CR) 10ArielGlenn: [C: 032] puppetize generation of lists of good dumps for rsyncers [operations/puppet] - 10https://gerrit.wikimedia.org/r/119033 (owner: 10ArielGlenn) [12:23:29] (03PS1) 10ArielGlenn: more pep8 for script to list last good dumps [operations/dumps] (ariel) - 10https://gerrit.wikimedia.org/r/119037 [12:23:56] (03CR) 10ArielGlenn: [C: 032] more pep8 for script to list last good dumps [operations/dumps] (ariel) - 10https://gerrit.wikimedia.org/r/119037 (owner: 10ArielGlenn) [12:25:53] PROBLEM - Puppet freshness on tungsten is CRITICAL: Last successful Puppet run was Sun 16 Mar 2014 12:19:26 PM UTC [12:26:53] PROBLEM - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Tue 11 Mar 2014 08:47:37 PM UTC [13:04:04] (03PS1) 10Alexandros Kosiaris: Convert carbon to RAID5+LVM [operations/puppet] - 10https://gerrit.wikimedia.org/r/119039 [13:06:23] RECOVERY - Puppet freshness on carbon is OK: puppet ran at Mon Mar 17 13:06:16 UTC 2014 [13:11:53] RECOVERY - NTP on carbon is OK: NTP OK: Offset -0.04691720009 secs [13:29:03] PROBLEM - Certificate expiration on virt1000 is CRITICAL: SSL error: [Errno 1] _ssl.c:504: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed [13:29:03] RECOVERY - LDAP on virt1000 is OK: TCP OK - 0.000 second response time on port 389 [13:29:13] RECOVERY - LDAPS on virt1000 is OK: TCP OK - 0.000 second response time on port 636 [13:29:22] (03PS1) 10Faidon Liambotis: Revert "Initial commit of mwprof::reporter" [operations/puppet] - 10https://gerrit.wikimedia.org/r/119042 [13:30:59] (03PS2) 10Faidon Liambotis: Revert "Initial commit of mwprof::reporter" [operations/puppet] - 10https://gerrit.wikimedia.org/r/119042 [13:31:05] (03CR) 10Faidon Liambotis: [C: 032] Revert "Initial commit of mwprof::reporter" [operations/puppet] - 10https://gerrit.wikimedia.org/r/119042 (owner: 10Faidon Liambotis) [13:32:53] ori: see ^^^ -- https://gerrit.wikimedia.org/r/119042 [13:34:53] RECOVERY - Puppet freshness on tungsten is OK: puppet ran at Mon Mar 17 13:34:43 UTC 2014 [13:37:51] apergos: labstore1001 has a disk critical alert for /exp/dumps (/exp?! wtf?), is that you or Coren? [13:38:32] (03PS1) 10ArielGlenn: puppetize generation of media directory lists for snapshots [operations/puppet] - 10https://gerrit.wikimedia.org/r/119043 [13:39:29] I guess that's himswitching the job from gluster to nfs without having a notion of the space requirements [13:39:44] okay [13:39:56] I have no idea really, I just saw "dumps" and pinged you :-) [13:40:08] I can turn off that job in about 2 minutes [13:40:09] (03CR) 10Alexandros Kosiaris: [C: 032] Support spatial_ref_sys and comments in postgres [operations/puppet] - 10https://gerrit.wikimedia.org/r/117920 (owner: 10Alexandros Kosiaris) [13:40:15] 3 minutes [13:40:27] (03CR) 10Alexandros Kosiaris: [C: 032] Creating a spatially enabled db on osmlab dbs [operations/puppet] - 10https://gerrit.wikimedia.org/r/119031 (owner: 10Alexandros Kosiaris) [13:40:31] (03PS2) 10ArielGlenn: puppetize generation of media directory lists for snapshots [operations/puppet] - 10https://gerrit.wikimedia.org/r/119043 [13:40:42] no worries, let's just ping him when he wakes up [13:40:48] yep I sure will [13:41:02] (03CR) 10Alexandros Kosiaris: [C: 032] Convert carbon to RAID5+LVM [operations/puppet] - 10https://gerrit.wikimedia.org/r/119039 (owner: 10Alexandros Kosiaris) [13:41:09] paravoid: It's me. /exp isn't actual disks, it's the NFS4 export tree. Lemme go look at this. [13:41:15] oh here he is [13:41:19] ah, you're here [13:41:21] hello :) [13:41:58] !log rebuilding CirrusSearch index for enwiki - its mapping is out of date, making the search results worse. [13:42:01] Logged the message, Master [13:42:08] (03PS3) 10ArielGlenn: puppetize generation of media directory lists for snapshots [operations/puppet] - 10https://gerrit.wikimedia.org/r/119043 [13:42:38] Ah. Fun. 10T wasn't enough for the dumps, it seems. [13:42:46] ah no, not quite [13:42:50] * Coren adds moar space. [13:42:56] I can cut back the number we transfer or something [13:43:13] Hot much is actually needed? [13:43:21] I can tell you in about 5mins [13:43:22] sorry [13:43:23] This is lvs, and I have some room to spare. [13:43:29] lvm* [13:43:56] perhaps a very silly question, but why do we need to copy them rather than mount them off the snapshot hosts? [13:44:44] well it used to be that they wanted a public share in labs for 'stuff' [13:44:53] paravoid: I'm just doing it the way they were done previously with gluster. A priori, I don't think we /need/ to copy them, but I think that was done to insulate labs from them. [13:45:00] so a subset of dumps wound up on that share [13:45:39] any way to limit bandwidth use of folks accessing those files over nfs? [13:46:14] paravoid: Also, that'd mean we either (a) let labs instance mount directly or (b) mount and reexport from labstore, which seems sucky to me. [13:46:57] apergos: Not trivially. NFS doesn't have any support for traffic shaping; we could probably hack something with suitable iptables rules but that seems brittle. [13:47:12] (03PS4) 10ArielGlenn: puppetize generation of media directory lists for snapshots [operations/puppet] - 10https://gerrit.wikimedia.org/r/119043 [13:47:20] why is to hacky to mount & reexport? [13:47:27] sucky, even [13:47:46] paravoid: Well, it doubles network traffic for one. :-) [13:47:48] I mean, sure, it's not great, but copying >10T of data for no reason is worse [13:47:56] probably? [13:48:15] Whereas the copy just costs a rsync of the new stuff. [13:48:33] yes, it's only 10t this first time cause it's a new share that's empty [13:49:21] apergos: I just looked at things a little closer, and now realize that /that/ disk isn't lvm. Crap. (It lives on the non-redundant disks) [13:49:28] oh hm [13:49:40] I can always send over 3 instead of 5 or whatever [13:50:14] That'd work. I should say that 99% of labs' uses of dumps cluster around the latest ones anyways. [13:51:26] (03PS5) 10ArielGlenn: puppetize generation of media directory lists for snapshots [operations/puppet] - 10https://gerrit.wikimedia.org/r/119043 [13:51:30] gahhh rebasaes [13:51:44] apergos: But first, do we want to export/reexport the actual partitions instead? Now's a good time to ask that question. [13:52:19] I don't want a pile of lab users copying at top speed to compromise bandwidth for dumps production [13:52:36] that would be a very bad deal indeed, so I would want limits [13:53:24] (03CR) 10ArielGlenn: [C: 032] puppetize generation of media directory lists for snapshots [operations/puppet] - 10https://gerrit.wikimedia.org/r/119043 (owner: 10ArielGlenn) [13:57:41] apergos: It's probably safest to keep doing a copy then. [13:59:34] pagecounts [13:59:38] they wanted those too... meh [14:00:46] 3.6T for those and they will only get larger over time [14:02:22] I'll do last 2 good for the umps [14:02:24] dumps [14:03:51] (03PS1) 10Alexandros Kosiaris: install-server::dhcp-server made precise aware [operations/puppet] - 10https://gerrit.wikimedia.org/r/119046 [14:04:52] (03PS1) 10ArielGlenn: adjust number of rsyncers to labs public data from datasets [operations/puppet] - 10https://gerrit.wikimedia.org/r/119047 [14:05:30] (03CR) 10Alexandros Kosiaris: [C: 032] install-server::dhcp-server made precise aware [operations/puppet] - 10https://gerrit.wikimedia.org/r/119046 (owner: 10Alexandros Kosiaris) [14:05:52] (03PS2) 10ArielGlenn: adjust number of rsyncers to labs public data from datasets [operations/puppet] - 10https://gerrit.wikimedia.org/r/119047 [14:05:54] (03CR) 10jenkins-bot: [V: 04-1] adjust number of rsyncers to labs public data from datasets [operations/puppet] - 10https://gerrit.wikimedia.org/r/119047 (owner: 10ArielGlenn) [14:06:38] riiiigght [14:06:47] bogus [14:07:32] (03PS3) 10ArielGlenn: adjust number of rsyncers to labs public data from datasets [operations/puppet] - 10https://gerrit.wikimedia.org/r/119047 [14:08:43] apergos: Do you need me to rm some stuff on the destination? [14:09:06] yeah if you don't mind tossing enwiki 4 and 5 [14:09:08] oldest [14:09:11] kk [14:09:12] that would be awesome [14:09:25] (03CR) 10ArielGlenn: [C: 032] adjust number of rsyncers to labs public data from datasets [operations/puppet] - 10https://gerrit.wikimedia.org/r/119047 (owner: 10ArielGlenn) [14:10:22] (03PS1) 10Alexandros Kosiaris: Fix dependency in install-server::dhcp-server [operations/puppet] - 10https://gerrit.wikimedia.org/r/119049 [14:10:33] I got six in there actually. [14:11:12] Oct-Mar inclusive [14:11:23] toss oldest 3? [14:11:47] (03CR) 10Alexandros Kosiaris: [C: 032] Fix dependency in install-server::dhcp-server [operations/puppet] - 10https://gerrit.wikimedia.org/r/119049 (owner: 10Alexandros Kosiaris) [14:12:30] yes [14:12:49] that wil give plenty o room for rsync to finish its job on the next run [14:13:13] * Coren exterminates. [14:13:13] RECOVERY - Disk space on labstore1001 is OK: DISK OK [14:13:23] sweet [14:13:51] That just freed 2.3T [14:14:45] any progress with labstore1001's puppet btw? [14:14:46] :) [14:14:49] enwiki is crazy big. [14:14:51] (03PS1) 10Alexandros Kosiaris: Fix typo introduced in d7d97a0 [operations/puppet] - 10https://gerrit.wikimedia.org/r/119050 [14:15:06] yep it's huge [14:15:26] paravoid: I'll be able to reenable it in about 6h [14:15:30] awesome [14:16:26] (03CR) 10Alexandros Kosiaris: [C: 032] Fix typo introduced in d7d97a0 [operations/puppet] - 10https://gerrit.wikimedia.org/r/119050 (owner: 10Alexandros Kosiaris) [14:24:19] !log disabling puppet on dataset2 for shuffling data sources around [14:24:23] Logged the message, Master [14:30:34] (03PS1) 10Alexandros Kosiaris: install-server module. Precise has squid3 [operations/puppet] - 10https://gerrit.wikimedia.org/r/119052 [14:32:32] (03CR) 10Alexandros Kosiaris: [C: 032] install-server module. Precise has squid3 [operations/puppet] - 10https://gerrit.wikimedia.org/r/119052 (owner: 10Alexandros Kosiaris) [14:39:02] (03PS1) 10coren: Labs: remove mode from mountpoints files {} [operations/puppet] - 10https://gerrit.wikimedia.org/r/119053 [14:40:29] (03PS1) 10Alexandros Kosiaris: role::installserver pin packages varied on distro [operations/puppet] - 10https://gerrit.wikimedia.org/r/119054 [14:40:37] (03CR) 10Hashar: [C: 031] Labs: remove mode from mountpoints files {} [operations/puppet] - 10https://gerrit.wikimedia.org/r/119053 (owner: 10coren) [14:43:04] (03CR) 10Alexandros Kosiaris: [C: 032] role::installserver pin packages varied on distro [operations/puppet] - 10https://gerrit.wikimedia.org/r/119054 (owner: 10Alexandros Kosiaris) [14:43:25] (03PS2) 10coren: Labs: remove mode and owner from mountpoints dirs [operations/puppet] - 10https://gerrit.wikimedia.org/r/119053 [14:44:51] greg-g: I can take the SWAT deploy today if no one else is jumping up and down [14:45:13] (03CR) 10coren: [C: 032] "Less is more?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/119053 (owner: 10coren) [14:48:49] (03CR) 10BryanDavis: "Sam's hack of removing the $BINDIR specification works because there is another mwversionsinuse in the search path:" [operations/puppet] - 10https://gerrit.wikimedia.org/r/118946 (owner: 10Reedy) [14:50:01] manybubbles: I was about to ping you and MaxSem to see which of us wanted to do it. If you want to do it, that's fine with me as I have no burning desire to. The one patch on the list today LGTM. [14:50:12] (03CR) 10Reedy: "Fixing it properly certainly sounds more sensible ;)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/118946 (owner: 10Reedy) [14:50:22] sweet [14:50:59] ebernhardson: looks like we're ready to do the SWAT deploy in ten minutes for your change [14:52:37] is someone fiddling with gerrit? [14:54:12] manybubbles: that'd be https://gerrit.wikimedia.org/r/#/c/118741/, right? [14:54:43] I'd +1 it, but wifi/gerrit is slow/unresponsive [14:54:59] * aude can't access gerrit [14:55:15] hey anomie, go ahead. I'll push a few mobile conf changes later [14:55:44] mlitn: yeah, that is the one. [14:55:52] grrit! [14:55:55] why you slow [14:56:51] might be simplest if extension maintainer +2 it into the release branch of the extension then I build the submodule update for production and +2 that myself [14:57:13] PROBLEM - DPKG on analytics1004 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:57:15] so, that means you (or someone else in flow) +2 it [14:57:18] oh, sure, I'll +2 instead ;) [14:57:42] thanks! [14:59:27] +2'ed [15:00:01] thanks [15:00:26] my submodule update one core is taking a while so I won't be able to deploy it for a few minutes. because, submodules [15:00:31] ^d: ^^ :( [15:01:25] don't worry, I wasn't planning on leaving in the next minute ;) [15:01:52] <^d> Hmm? [15:02:03] geeeerit! [15:02:25] <^d> meh. [15:03:40] manybubbles: yessir, that's mostly your call :) [15:03:44] I should really kick off the submodule update 20 minutes before hand. that way it is fast to get just what I need [15:04:44] greg-g: it's great to have a deployment time slot this early in the day :) [15:04:58] (have nothing for today) [15:05:15] aude: you guys were part of the reasoning :) [15:05:31] great [15:05:33] holy crap, gerrit is dead in the water for me [15:07:05] <^d> wfm. [15:07:22] (03PS1) 10Chad: Revoke my gerrit admin privs [operations/puppet] - 10https://gerrit.wikimedia.org/r/119060 [15:07:24] <^d> See ^ [15:08:42] * bd808 looks around to see who he'll bug about gerrit if that patch is merged [15:08:52] wait, what? [15:08:53] hah [15:08:54] yep, works for me now [15:09:27] anomie, are you deploying? [15:09:33] MaxSem: manybubbles is [15:09:40] ah [15:09:53] manybubbles, if you're not ready yey, I'd like to push https://gerrit.wikimedia.org/r/#/c/114634/ [15:10:23] MaxSem: sounds good to me [15:10:31] you have the conch [15:10:35] paravoid: don't worry, ^d is known for making such patchsets [15:10:49] ^d: and don't think it'll work: we'll still bug you even if you didn't have admin rights [15:11:06] <^d> Yeah, but it's easier for my to wash my hands and say "Sorry, can't help!" [15:11:18] <^d> "I totally would, if I had access!" [15:11:23] (03CR) 10MaxSem: [C: 032] Don't collapse sections on Wiktionaries [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/114634 (owner: 10MaxSem) [15:11:38] ^d: then you'll just be a liar! [15:12:35] (03Merged) 10jenkins-bot: Don't collapse sections on Wiktionaries [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/114634 (owner: 10MaxSem) [15:12:59] (03CR) 10Chad: [C: 031] "Less users with sudo is good for security. Please merge immediately." [operations/puppet] - 10https://gerrit.wikimedia.org/r/119060 (owner: 10Chad) [15:13:10] MaxSem: I'm ready to +2 the wmf18 core change to get flow deployed. let me know when you are done with your config change. [15:13:45] (03Abandoned) 10Chad: Revoke my gerrit admin privs [operations/puppet] - 10https://gerrit.wikimedia.org/r/119060 (owner: 10Chad) [15:13:48] !log maxsem synchronized wmf-config/ 'https://gerrit.wikimedia.org/r/#/c/114634/' [15:13:52] Logged the message, Master [15:13:59] manybubbles, ^^ [15:14:10] MaxSem: thanks! [15:14:22] * manybubbles has the conch [15:15:19] ^d: "fewer" [15:15:59] <^d> My English teachers never managed to drill that one into me. [15:17:13] PROBLEM - jmxtrans on analytics1021 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args -jar jmxtrans-all.jar [15:18:49] (03CR) 10BryanDavis: "We really do need to get broader coverage for gerrit support. Chad and Antoine get pinged to look into slowness in the gerrit/zuul/jenkins" [operations/puppet] - 10https://gerrit.wikimedia.org/r/119060 (owner: 10Chad) [15:19:37] ^d: are you that pissed off by Gerrit ? [15:19:50] mlitn: ready to sync [15:20:05] <^d> hashar: Is it a day that ends in Y? [15:20:15] not in my language :D [15:20:35] you can surely brain dump your Gerrit knowledge and spread it to a bunch of folks in mwcore team [15:20:39] ping me and I'll mash the enter key and it'll go [15:20:40] I will be happy to help [15:21:04] ^d: I've only recently-ish gotten that one straight (fewer/less) [15:21:23] manybubbles: okidoki [15:22:14] <^d> hashar: I like the people on my team--why would I want to make them suffer with Gerrit? [15:22:32] !log manybubbles synchronized php-1.23wmf18/extensions/Flow/ 'SWAT Backport for fatal on Special:Contributions' [15:22:44] ^d: to share the pain? :-] [15:22:51] mlitn: all done [15:22:55] looks better [15:22:57] ^d: this way you will get more "free" time to work on a Gerrit replacement [15:23:03] manybubbles: great thanks [15:23:08] problem seems fixed :) [15:23:17] * manybubbles puts conch back into center of circle [15:27:13] RECOVERY - jmxtrans on analytics1021 is OK: PROCS OK: 1 process with command name java, args -jar jmxtrans-all.jar [15:27:53] PROBLEM - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Tue 11 Mar 2014 08:47:37 PM UTC [15:33:40] manybubbles: doh, i thought that was 4pm for some reason [15:33:59] no big deal [15:34:22] mlitn had you covered [15:34:25] excellent :) [16:03:52] PROBLEM - Puppet freshness on analytics1026 is CRITICAL: Last successful Puppet run was Mon 17 Mar 2014 03:58:10 PM UTC [16:05:52] PROBLEM - Puppet freshness on analytics1026 is CRITICAL: Last successful Puppet run was Mon 17 Mar 2014 03:58:10 PM UTC [16:07:52] PROBLEM - Puppet freshness on analytics1026 is CRITICAL: Last successful Puppet run was Mon 17 Mar 2014 03:58:10 PM UTC [16:09:52] PROBLEM - Puppet freshness on analytics1026 is CRITICAL: Last successful Puppet run was Mon 17 Mar 2014 03:58:10 PM UTC [16:11:52] PROBLEM - Puppet freshness on analytics1026 is CRITICAL: Last successful Puppet run was Mon 17 Mar 2014 03:58:10 PM UTC [16:13:52] PROBLEM - Puppet freshness on analytics1026 is CRITICAL: Last successful Puppet run was Mon 17 Mar 2014 03:58:10 PM UTC [16:15:52] PROBLEM - Puppet freshness on analytics1026 is CRITICAL: Last successful Puppet run was Mon 17 Mar 2014 03:58:10 PM UTC [16:17:52] PROBLEM - Puppet freshness on analytics1026 is CRITICAL: Last successful Puppet run was Mon 17 Mar 2014 03:58:10 PM UTC [16:19:52] PROBLEM - Puppet freshness on analytics1026 is CRITICAL: Last successful Puppet run was Mon 17 Mar 2014 03:58:10 PM UTC [16:21:52] PROBLEM - Puppet freshness on analytics1026 is CRITICAL: Last successful Puppet run was Mon 17 Mar 2014 03:58:10 PM UTC [16:23:52] PROBLEM - Puppet freshness on analytics1026 is CRITICAL: Last successful Puppet run was Mon 17 Mar 2014 03:58:10 PM UTC [16:25:52] PROBLEM - Puppet freshness on analytics1026 is CRITICAL: Last successful Puppet run was Mon 17 Mar 2014 03:58:10 PM UTC [16:27:52] PROBLEM - Puppet freshness on analytics1026 is CRITICAL: Last successful Puppet run was Mon 17 Mar 2014 03:58:10 PM UTC [16:28:02] RECOVERY - Puppet freshness on analytics1026 is OK: puppet ran at Mon Mar 17 16:28:01 UTC 2014 [16:29:52] PROBLEM - Puppet freshness on analytics1026 is CRITICAL: Last successful Puppet run was Mon 17 Mar 2014 04:28:01 PM UTC [16:31:52] PROBLEM - Puppet freshness on analytics1026 is CRITICAL: Last successful Puppet run was Mon 17 Mar 2014 04:28:01 PM UTC [16:36:09] bd808: Sort of! [16:41:34] PROBLEM - Puppet freshness on labsdb1004 is CRITICAL: Last successful Puppet run was Mon 17 Mar 2014 01:40:58 PM UTC [16:45:51] hi rachel, had fun? :) [16:48:14] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000 [16:58:34] RECOVERY - Puppet freshness on analytics1026 is OK: puppet ran at Mon Mar 17 16:58:26 UTC 2014 [17:02:14] RECOVERY - Puppet freshness on labsdb1004 is OK: puppet ran at Mon Mar 17 17:02:07 UTC 2014 [17:20:34] PROBLEM - Puppet freshness on dataset2 is CRITICAL: Last successful Puppet run was Mon 17 Mar 2014 02:19:32 PM UTC [17:21:04] PROBLEM - Varnishkafka Delivery Errors on cp3012 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 57.466667 [17:31:38] springle: hi [17:32:02] probably away, I'll mail [17:32:24] PROBLEM - Varnishkafka Delivery Errors on cp3022 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 2791.93335 [17:33:04] RECOVERY - Puppet freshness on dataset2 is OK: puppet ran at Mon Mar 17 17:32:55 UTC 2014 [17:35:04] RECOVERY - Varnishkafka Delivery Errors on cp3012 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [17:35:22] akosiaris, since you are on rt duty... https://rt.wikimedia.org/Ticket/Display.html?id=6961 [17:35:24] RECOVERY - Varnishkafka Delivery Errors on cp3022 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [17:36:10] gwicke: ok, I look into it [17:41:10] !log re-enabled dataset2 puppet, move done [17:41:18] a little after the fact, oh well [17:43:47] where is grrrit-wm ? [17:44:06] <^d> We got rid of it. Decided we didn't like bots. [17:45:00] great ^d :) [17:45:46] which channel is it this time?:) [17:47:13] #wikimedia-operations-bots-only-watch-out-for-spam [17:47:58] apergos: "move done" ? wot? [17:48:02] well [17:48:06] not exacly; [17:48:14] so ther eis the hostname (download.wm.o) [17:48:14] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: reqstats.5xx [warn=250.000 [17:48:20] and there is a user (eric z) [17:48:55] which I have turned off his rsyncs but I'mreluctant to just edit scripts of his on stat1002, maybe I'd miss some, so I might have to turn those backon til I can get a hole of him [17:48:56] is the hostname moving to the new host for downloads [17:49:02] yes [17:49:04] where the mw-release people have access [17:49:06] no [17:49:13] :-D [17:49:16] heh [17:49:18] releases.wm.o is that host [17:49:27] downlaods willmove to eqiad is all [17:49:29] ok [17:49:49] but all the cron jobs are in and moved and puppeted up, there's one where I need to do a bunch of back runs on if before I can enable it [17:50:26] cool! [17:50:29] anyways... right nowif there wre an easy way to reach erik z in real time and say 'hey can you point your stuff to dataset1001', I would love it [17:50:52] put a banner on stats.wm.org :) [17:50:59] hahaha noooooooo [17:51:16] motd [17:51:19] wall [17:51:59] and if he doesn't log in for 3 days then what? [17:52:39] check contacts page on office for phone number and text him :p [17:54:14] RECOVERY - DPKG on analytics1004 is OK: All packages OK [17:54:59] milimetric: ping [17:55:17] hi paravoid, what's up [17:55:19] hi [17:55:37] are you on verizon right now? [17:55:51] mark: sent you a mail [17:56:04] yes paravoid, on verizon [17:56:12] apergos: I'm told Skype can work [17:56:19] can you give me your IP (in private if you prefer)? [17:56:32] a traceroute to e.g. bast1001 wouldn't hurt, but I'm more interested in your IP [17:56:50] 108.16.110.132 paravoid, tracerouting now [17:58:12] heh, sorry, no traceroute installed - one sec [17:58:28] and do you have ipv6? [17:58:37] no, I don't [17:58:51] ok, thanks [18:28:34] PROBLEM - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Tue 11 Mar 2014 08:47:37 PM UTC [18:57:42] jgage: around ? [18:58:42] hi [18:59:27] hi, regarding RT 7050, i have something i have been using for some time [18:59:49] do you have a few sec's to glace around it ? [19:01:11] re:RT 7050: ip link show [19:01:31] if NO-CARRIER then bad [19:02:13] * jgage looks [19:02:30] mutante: for that you need to be inside the device or poll using snmp [19:02:57] what's the alternative, monitor what the router has negotiated for that port? [19:03:25] i think a normal snmp query will be enough [19:03:50] iirc query 1.3.6.1.2.1.2.2.1.14 [19:04:05] for IfInErrors [19:04:21] * matanya is googling to see if he remembers the oid correctly [19:04:40] disregard the command i said, but we can execute whatever we want locally and then send out to NSCA (as alternative to snmptrap) [19:04:49] nrpe should be fine [19:05:18] running this under netmon1001 (just an e.g.) using nrpe will do [19:06:03] we don't have snmpd running on our servers nor do we intend to [19:06:28] paravoid: we do have snmptrapd [19:06:45] not on every box [19:06:46] then just nrpe and if for some reason you want passive then nsca [19:07:32] so how do you plan to query paravoid ? [19:07:33] looks like we can get the value directly from /sys, though hopefully there's a nicer path: /sys/devices/pci0000:00/0000:00:11.0/0000:02:01.0/net/eth0/speed [19:07:45] /sys/class/net/eth0? [19:08:02] but you need /duplex too [19:08:16] ah yeah, perfect [19:08:30] and you need to only do the check for all nics where /carrier == 1, as we have tons of boxes with unplugged cards [19:09:09] are supported speeds anywhere under sysfs, I wonder [19:09:23] checking for 100mbps is easy, but you should also check if e.g a 10g card has negotiated at 1gbps [19:09:32] that shouldn't happen as often [19:09:49] but if we're going to have a check... [19:10:41] /sys/class/net/eth*/statistics/ has tons of metrics that we should check too, but these are probably better being check_graphited on or ganglios or whatever [19:14:48] ethtool will show supported link modes as well as current speed [19:15:59] yes [19:16:25] check exchange.nagios.org for existing ones? some are crap but some are not:) [19:16:32] looks like it's not getting that info from proc/sys though [19:16:44] netlink, most probably [19:16:59] yeah [19:17:00] socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 3 [19:17:01] fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0 [19:22:14] wo ist morebots? [19:22:29] mehrbots [19:23:13] mehrroboter :p [19:23:18] weiss ich auch nicht [19:23:29] er ist gestorben [19:23:40] Coren/andrewbogott: any idea on why we're getting Wikitech spam at our noc@ alias? [19:23:49] Subject: Wikitech page Nova Resource:I-0000017b.eqiad.wmflabs has been changed by anonymous user 127.0.0.1 [19:23:53] etc. [19:24:13] ori: grrrit-wm is also gone. have all bots been moved out of channel? [19:24:25] Coren is moving tools to eqiad [19:24:25] morebots is schlimm, aber nicht schlimmer als meiner deutsche [19:24:28] paravoid: probably wikitech is configured to do that, although I don't know where... [19:24:40] it only happened recently, I'm guessing eqiad [19:25:04] Hm... [19:25:14] ori: labs migration related? [19:25:35] well, that's a notice of an instance being created. But as far as I know that's configured to only update at virt0 [19:25:36] wahrscheinlich [19:25:37] which is in tampa [19:26:04] paravoid: Someone set the labslogbot wikitech account's email to noc@, and echo got turned on. [19:26:17] oh heh [19:26:38] That someone was probably me, but if I did it it was ages ago [19:26:38] ori: bestimmt. Frag mal den Andreas Bogott [19:26:55] please fix :) [19:27:18] Oh, probably the issue is some echo change… anyway, yes, I will fix! [19:27:24] danke! [19:27:57] mutante: ich finde es toll, mit dir deutsch zu sprechen. wenn du geduld fur meiner deutsche hast, koennen wir veilleicht manchmals deutsch sprechen [19:29:42] paravoid: let me know if it keeps happening... [19:29:54] will do, thanks :) [19:31:18] ori: Kein Problem, erste Lektion: Deklination von "to download" im Deutschen: "ich downloade, du downloadest, er/sie/es downloadet, wir downloaden..". Wir verwenden in der IT sehr viel [[en:Denglisch]]. [19:32:23] ori: http://en.wikipedia.org/wiki/Denglisch#Pseudo-anglicisms [19:32:34] ori: mobile phone = Handy :) [19:32:57] springle: ping? [19:33:17] paravoid: pong [19:33:22] root@labsdb1003:~# du -hs /a/lost+found/ [19:33:22] 19G /a/lost+found/ [19:33:48] 9109 files [19:34:30] but they look old, Aug 2014 [19:34:35] er, 2013 obviously [19:34:41] <^d> Old? That's the future! [19:34:49] Sep 2013 the newest ones [19:34:51] that can be cleaned out. the datadir was rebuilt since then [19:35:01] ok, done [19:35:05] thanks [19:36:20] springle: i need your help with puppet3 related stuff [19:36:37] most things left on my list are DB related [19:36:59] ok [19:37:09] springle: https://etherpad.wikimedia.org/p/Puppet3 [19:37:27] paravoid: yours with the monitoring patch you -1ed [19:37:39] Reedy or anybody else interested in a quick config push? https://gerrit.wikimedia.org/r/#/c/118220/ <- adds app indexing hint links for google search spider on mobile site [19:37:46] https://gerrit.wikimedia.org/r/#/c/118435/ [19:37:52] thanks in advance :D [19:38:44] brion: you can't deploy that? [19:38:59] paravoid: i'm not up on the current config deployment system, i don't want to fuck it up :D [19:38:59] I thought all deployers could push mediawiki-config? [19:39:03] springle: https://gerrit.wikimedia.org/r/#/c/108488 <-- is the blocking one [19:39:04] oh [19:39:06] i probably *can* [19:39:08] not sure i *should* ;) [19:39:12] yeah I'm on the same group I think [19:39:14] let the SWAT team do it ?:) [19:39:53] <^d> brion: You need a lesson :D [19:40:39] paravoid: regarding puppet3, good progress, would be nice if a few more can give a hand for a final push [19:40:50] matanya: awesome, thanks [19:41:03] I'm mostly waiting on akosiaris since he has this magic script that makes sure changes are noop [19:41:16] him merging your changes or mergin the magic script ;) [19:41:23] akosiaris: wink wink nudge nudge [19:41:31] requests slots on the magic script too [19:41:36] he can't check stuff i didn't fix since i don't know all the stuff [19:41:37] akosiaris: :) https://gerrit.wikimedia.org/r/#/c/118790/ [19:42:08] * gwicke contends for akosiaris' attention to perhaps get the parsoid deploy process fixed before the one starting in 20 minutes [19:42:26] well, i'll do 18 lines manually, but fwiw, if i make large changes i get "too large" comments and if i make them small "too small":) (now that we have the script:) [19:42:54] so paravoid , i pushed all the stuff i know/found out about [19:43:48] but most if what is left is above my knowlegde/paygrade :) [19:43:48] *of [19:45:34] matanya: in maintenance.pp, you commented on the "selector inside resource", remember [19:46:24] i did that for the new one, but if you wanted to treat all the existing ones the same way.. there would be more todo [19:50:09] ok folks i've added the app indexing config tweak to swat deployments todo for later today on https://wikitech.wikimedia.org/wiki/Deployments [19:50:13] if anyone objects let me know :D [19:51:16] mutante: i'm looking at it atm [19:51:34] matanya: pt1 first [19:53:12] mutante: i'm going to comment on every eye itch, take what is useful for you [19:54:40] no grrit wm is already not so nice [19:54:56] doesn't notice anymore [20:01:36] AFTv5 is gone, right? [20:04:45] Yup [20:04:49] Party time [20:05:01] mutante: i killed me [20:05:04] *you [20:05:19] matanya: what?:) [20:05:28] this lint change [20:05:37] i don't like half work [20:06:21] ok,if i make large changes they are too massive, when i split it up into several parts they are "half work", no way to do it right [20:07:02] yup [20:07:04] you'll tell me to git squash, aren't you:) [20:07:11] no [20:07:13] heh [20:07:18] just ranting in general [20:07:47] the whole lint of that file is a reply to https://gerrit.wikimedia.org/r/#/c/81257/ [20:07:54] ok [20:08:24] once we have "magic script" i'll make them huge again, deal?:) [20:08:28] that's what akosiaris said as well [20:08:59] ok, commented [20:09:24] !log deployed Parsoid d0f0080a [20:09:43] any roots around to call 'dsh -g parsoid service parsoid restart' ? [20:09:55] i can [20:10:08] thanks! [20:10:51] * gwicke heads to SAL to manually add an entry [20:10:51] !log rolling restart of parsoid service per gwicke's request [20:10:51] oh, bots dead... [20:10:51] why is bot deaD? [20:11:20] paravoid: can you please update the list of https://etherpad.wikimedia.org/p/Puppet3 ? [20:11:38] it is a few moths old, and stuff change quite a bit [20:11:40] gwicke: its running now [20:11:54] a LOT faster than last time. [20:12:18] gwicke: done! [20:12:42] robh, thanks! [20:14:02] so what happened to morebots? [20:14:32] i bet labs migration robh [20:14:51] robh: how long is our hardware warranty usually? 4 years? [20:14:51] yeah, I think there is some post on wikitech about it [20:15:09] paravoid: 3 years [20:15:25] (imaginary) !log DNS update, removing ersch [20:15:26] and we don't extend it? [20:16:01] paravoid: not on servers. we keep support contracts for network gear extended and valid [20:16:12] servers we purchase witht he 3 year and normally never extended [20:16:35] thanks mutante below 20 pending changes now [20:16:47] robh: ok, thanks [20:16:52] paravoid: we've also not (in recent memory) reviewed if we should make that longer or shorter [20:17:04] if the lifespan of a server is worth renewing it past those [20:17:13] usually we just let them expire and swap parts around as they die. [20:18:10] Coren: So morebots runs in toollabs, is that stuff supposed to be back online at this point? [20:19:47] robh: That depends strictly on whether its maintainer(s) have migrated it by hand. If so, then yes. Otherwise, the stragglers' migration is in progress and won't be done for several hours. [20:20:15] ok, so if not, should be able to go back online once you force it over later, ok... [20:20:31] Right. [20:22:03] seems liek the files are there. [20:23:24] chasemp: any news regarding https://gerrit.wikimedia.org/r/#/c/111189/ ? [20:25:28] one more DNS update - removing solr1-3 - already shut down [20:26:00] hashar: https://gerrit.wikimedia.org/r/#/c/118966/ [20:30:34] Gloria: https://gerrit.wikimedia.org/r/#/c/119109/ I guess? [20:43:09] matanya: not reviewing anymore this week sorry. unless it is production urgent. [20:43:29] fair enough, though week just started :P [20:43:34] thanks for the note [20:44:07] matanya: I have spent 4 hours today triaging my last week emails :/ [20:44:23] oh boy [20:44:45] <^d> It's not friday yet? :( [20:46:01] we are working on it [20:47:11] apergos: Are there known issues with current dumps? Vanadium? Gadolinium? At least pagecount-dumps stopped working since 14:00 [20:47:42] I turned ff those cron jobs but they shouldbe kicking in again onthe new server [20:47:46] let me see how that is [20:51:01] Reedy, brion : shall I remove the isset or not? anyone wants to decide? [20:51:10] on https://gerrit.wikimedia.org/r/#/c/118220/5/wmf-config/mobile.php [20:51:37] :) [20:51:44] i'll take it out [20:51:45] dennyvrandecic: Personally, I'd remove the whole conditional [20:51:57] Leave the comment and the assignment below it [20:52:30] that works [20:52:30] brion: thx [20:52:54] yay, less lines of code! [20:52:54] Reedy: patch set updated [20:52:55] yep, needed a directory created over there [20:52:56] :D [20:52:56] meh [20:53:06] I'mgoing to be doing this for the next day I bet [20:53:43] apergos: fine [20:54:15] more DNS update - remove tmh1-2, already shut down [20:55:12] matanya: I'm a ways off from that, I will hit you up when I come around too it. still trying to understand the context. [20:55:34] thanks chasemp [21:00:53] is anyone opposed to me adding 'tree' to 'base::standard-packages'? [21:01:06] I feel like maybe a diff for that is overkill but asking isn't? [21:01:37] https://en.wikipedia.org/wiki/Tree_(Unix) [21:03:43] !log upgrading varnishkafka on bits and mobile hosts to 1.0.2-1 (includes fix for unexpected whitespace in headers and logrotate cron spam) [21:03:43] chasemp: it would be a gerrit change anyways (re: diff) [21:04:09] sure but absent any naysayers I could just self approve and merge? [21:04:41] not sure, standard-packages have been discussion in the past [21:04:45] matanya: hey [21:04:47] adding to them i mean [21:04:48] matanya: https://etherpad.wikimedia.org/p/Puppet3 is updated [21:05:04] thanks [21:05:24] PROBLEM - Varnishkafka log producer on cp3011 is CRITICAL: PROCS CRITICAL: 0 processes with command name varnishkafka [21:05:27] much smaller :) [21:06:24] PROBLEM - DPKG on cp1046 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [21:06:24] PROBLEM - DPKG on cp4011 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [21:06:34] PROBLEM - Varnishkafka log producer on cp1046 is CRITICAL: PROCS CRITICAL: 0 processes with command name varnishkafka [21:07:04] PROBLEM - Varnishkafka log producer on cp4011 is CRITICAL: PROCS CRITICAL: 0 processes with command name varnishkafka [21:07:24] RECOVERY - DPKG on cp4011 is OK: All packages OK [21:08:04] RECOVERY - Varnishkafka log producer on cp4011 is OK: PROCS OK: 1 process with command name varnishkafka [21:08:24] RECOVERY - DPKG on cp1046 is OK: All packages OK [21:08:34] RECOVERY - Varnishkafka log producer on cp1046 is OK: PROCS OK: 1 process with command name varnishkafka [21:12:04] hedonil: thosefilesshould be showing up in a while, once the rsync to the tampa mirror fires off again [21:12:19] thanks for pointing it out, got it solvd sooner than my review tomorrow [21:12:21] apergos: thanks! [21:15:24] RECOVERY - Varnishkafka log producer on cp3011 is OK: PROCS OK: 1 process with command name varnishkafka [21:29:34] PROBLEM - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Tue 11 Mar 2014 08:47:37 PM UTC [21:36:45] MaxSem: ping? [21:36:52] pong [21:36:54] hi [21:36:58] hi [21:37:15] we got a request to enable the mdot for legalteam wiki [21:37:33] I added it to DNS and I intend to add it to the mobile redirects, but before that I think we need some mediawiki-config change [21:37:38] that I'm not able to pinpoint :) [21:37:49] well, I haven't looked hard enough, I'm lazy and you're around [21:38:36] actually it looks like mobile view link already works properly [21:38:40] I thought that too... I ended up not finding it and it appeared to possibly not actually be needed (it looked like it was 'always working' the dns just needed to be pointed) [21:39:01] so what remains to be done is redirection regex [21:39:04] [but I was also in the enviable position of making it someone elses problem ;) ] [21:40:08] it appears to work like normal... does it need any more regex? [21:40:18] I'll add it and also add office while at it [21:40:23] I wonder why we don't have that [21:40:48] ahh /me realizes why it's needed and shuts up [21:43:15] MaxSem: https://gerrit.wikimedia.org/r/119205 [21:45:51] paravoid, +1d [21:46:09] I saw, merged already ;) [21:46:22] grr grrrit-wm [21:46:51] !log springle synchronized wmf-config/db-eqiad.php 's1 depool db1037' [21:47:09] thanks [21:47:42] I wonder why HTTP->HTTPS redirects aren't mdot-aware [22:03:01] !log xtrabackup clone db1005 to db1037 [22:07:29] watches puppet run on terbium because i merge(d) maintenance.pp changes [22:07:40] springle: no bot day [22:07:54] bot-free! wheee [22:08:12] aha [22:08:20] nice!:) [22:08:27] i wanted grrrit-wm , thx [22:09:42] tools.lolr = eqiad?:) [22:10:18] mutante: Yep. [22:10:27] Coren: 'grats!:) [22:10:46] to the whole tool migration i mean [22:11:34] mutante: Not quite done yet, don't jinx it! :-) [22:12:23] ok. but one bot was a good sign:) [22:16:03] mutante: There were way more migrated in the past two weeks; that's just a poor little orphan that had been abandonned by its straggler maintainers. :-) [22:16:31] ok,gotcha [22:16:56] where maintainer = unknown [22:17:15] No, known but doesn't keep up with labs-l, for the most part. :-) [22:17:41] ok:) [22:17:51] Or, in this particular case, "three maintainers all persuming one of the other two would do it" :-) [22:18:01] that needs the xkcd link i got from robh the other day [22:18:27] 1339 :) [22:18:54] Ah, a recent one. :-) [22:20:04] one of the best [22:20:56] along with https://xkcd.com/705/ and http://xkcd.com/149/ [22:26:54] ops, any idea why we would have an edit from one of googls name servers ? [22:26:57] https://en.wikipedia.org/w/index.php?title=Wikipedia:VisualEditor/Feedback&diff=prev&oldid=572188521 [22:27:10] 8.8.8.8 [22:29:19] mutante: a trivial fix please : https://gerrit.wikimedia.org/r/#/c/119212/ [22:33:10] Maybe a Googler decided to voice their opinion on VisualEditor, thedj [22:34:09] thedj: :-D [22:35:05] thedj: It's most likely that it was a DNS resolution failure. [22:35:51] Not a friendly Googler then? :-( [22:36:19] All Googlers are friendly, I'm sure. [22:36:25] But it seems unlikely. [22:39:20] James_F: I like the reasoning though ;) [22:39:42] dennyvrandecic: :-) [22:39:57] dennyvrandecic: OK, all Googlers, present company excluded, are lovely. ;-) [22:40:37] James_F: :D [22:51:24] greg-g: Do we know who's doing this afternoon's SWAT deploy? [22:52:38] whoever stands ujp [22:52:40] -j [22:53:13] RoanKattouw: and the search thing is mis-bucketed, I'll move it to where it should be, this morning's [22:57:30] James_F: perhaps from the ulsfo setup or something [22:57:57] thedj: Perhaps… [22:58:55] greg-g: I can do it but I'm still wrapping up a meeting now [22:59:37] RoanKattouw: What meeting? [22:59:44] RoanKattouw: Oh; … [22:59:46] James_F: The one in my calendar [22:59:50] heh [22:59:59] # Puppet Name: purge_securepoll [23:00:04] is running on terbium now [23:00:13] RoanKattouw: it's brion's, he may be ok with you delaying a bit [23:00:22] yo yo [23:01:18] RoanKattouw: BTW, grrrit-wm may be in IRC but it sure as hell isn't reporting gerrit activity. [23:01:32] Is labs migration over? [23:01:58] puppet wizards, I'd use some help with https://gerrit.wikimedia.org/r/119216 ; not going to put much more time in it [23:04:51] James_F: Urgh. I restarted it and everything [23:05:13] RoanKattouw: Is it pointing at the wrong IP for gerrit or something? [23:05:20] Unlikely [23:05:22] It moved, not Gerrit [23:05:31] Ping Yuvi when he wakes u [23:05:33] p [23:05:39] RoanKattouw: 316bf6c48e5b15225ba1af03dfa860e2eed23943 [23:05:41] gah [23:05:44] https://wikitech.wikimedia.org/wiki/Grrrit-wm#Debugging_stuck_stream [23:06:04] gerrit-to-redis also needs to be kicked [23:06:04] greg-g: Wait WTF why does the calendar say that SWAT windows are at 3pm? [23:06:19] greg-g: Never mind, I'm an idiot, that was the UTC column [23:06:42] :) [23:06:54] (WTF is your UTC only 1 hour off? :P ) [23:07:11] oh, the am one [23:07:16] 03:00 [23:07:18] greg-g: The other window :) [23:07:57] aude: Looking [23:09:08] !log catrope updated /a/common to {{Gerrit|Ia55be2c4c}}: Set up configuration for App indexing [23:09:18] thedj, hard to tell so long after [23:09:58] if it were yesterday we would have had logs to look up WTF's with his XFF [23:12:27] !log catrope synchronized wmf-config/InitialiseSettings.php 'Add wmgMFAppPackageId' [23:13:40] Coren: did COIBOt fail today due to your doing things on Labs? [23:14:02] sDrewth: I don't know; was it properly migrated before the deadline? [23:14:42] !log catrope synchronized wmf-config/mobile.php 'Add wmgMFAppPackageId' [23:15:10] sDrewth: If not, then yes -- it's being migrated in batch with all the stragglers. [23:15:44] brion: anything break? [23:15:51] let's see :D [23:16:14] looks good \o/ [23:16:22] and yeah that's supposed to be http slash [23:16:30] weird [23:16:34] but cool! [23:16:41] first day of SWAT: success! [23:17:16] thanks all :D [23:19:19] dennyvrandecic: please confirm it looks right on your end :D [23:19:22] links should be live [23:19:32] (except where the previous html is cached) [23:19:46] greg-g: you're ok with us tomorrow first slot? per https://wikitech.wikimedia.org/wiki/Deployments [23:19:53] brion: wohoo! I will check it out, but I will need the team to confirm [23:19:58] ok :D [23:19:59] thank you! [23:20:02] aude: community concensus? [23:20:04] :) [23:20:04] enabling guided tours on wikidata (it's already on test.wikidata) [23:20:05] springle, that blocking query on s1-analytics won't seem to die (stuck in State="query end"). Got a second to take a look? [23:20:06] yes [23:20:09] yeah [23:20:30] what is swat ? [23:20:42] http://test.wikidata.org/wiki/Wikidata:Tours [23:20:51] https* [23:20:54] thedj: https://wikitech.wikimedia.org/wiki/SWAT_deploys [23:20:58] * greg-g runs [23:22:33] ah. good to know. [23:22:57] Ugh, I restarted gerrit-to-redis (which wasn't running) and restarted lolrrit-wm but it's still not working :( [23:23:10] then yuvi has to fix [23:23:32] could be labs is not working quite right now [23:24:42] Wat [23:24:43] error: server certificate verification failed. CAfile: /etc/ssl/certs/ca-certificates.crt CRLfile: none while accessing https://git.wikimedia.org/git/labs/tools/grrrit.git/info/refs [23:24:45] fatal: HTTP request failed [23:24:58] brion: what can you tell me about the 'embed-sandbox' labs project? Is it still in use? [23:25:37] andrewbogott: it's a couple static files [23:25:44] so shouldn't need any fancy restarting or porting :D [23:25:54] mainly i needed a unique domain name for a research project [23:26:00] which i keep meaning to get back to [23:26:04] there's the bot [23:26:11] andrewbogott: https://www.mediawiki.org/wiki/Extension:EmbedSandbox [23:26:27] kill it if you have to [23:26:29] but let me know [23:26:41] brion: so... [23:27:09] Today I'm starting to shut down and store labs projects that are 'unclaimed'. Which yours is, at the moment :) Would you like me to migrate it to eqiad instead? [23:28:08] andrewbogott: if it's no trouble yes please :D [23:28:18] should be easy, I'll have a go [23:28:21] \o/ [23:28:23] thanks [23:29:35] Hmm, the Redis thing seems to be working but lolrrit seems to not be connecting to it [23:32:36] Coren, I don't know whether it was migraated or not. I don't see it. It wasn't showing on the list either way that I saw last week [23:32:49] coren I will just ping beetstra [23:33:07] sDrewth: That's your best bet. [23:36:24] halfak: it is busy rolling back: ---TRANSACTION 8A2A845D6, ACTIVE 616248 sec rollback [23:36:30] not much we can do except wait [23:36:37] OK> Thanks for your help [23:36:39] or kill mysqld [23:37:09] !log aaron synchronized wmf-config/PoolCounterSettings-eqiad.php 'Added TMHTransformFrame pool counter config' [23:39:03] halfak: did you notice: undo log entries 70951430693 ... wow! :) [23:39:17] Not sure what that means [23:39:59] currently debugging something that happened the other week, Were there any ExternalStore outages on 2-25 or somewhere i could find out? [23:43:18] springle: what's the deal with the massive amount of log entries? [23:48:16] halfak: the undo log is part of multi-versioning. just means this big transaction will take ages to unwind that log [23:48:26] if you're keen: https://dev.mysql.com/doc/refman/5.5/en/innodb-multi-versioning.html [23:48:51] Thanks. [23:51:07] halfak: will try to speed it up, but have to reduce other traffic for a while [23:51:12] kills coming [23:52:47] oh, man… brion, that's a lucid instance… hard to get it to play with our new storage system. [23:53:06] So, it's migrated but your homedir will be empty on that instance... [23:53:14] andrewbogott: that's fine [23:53:27] ok. New instances in that project will work fine. [23:53:28] i can restore any files from git :D [23:53:37] awesome thanks [23:55:59] ori, do you know about the 'eventlogger' project? [23:56:19] aka flog.wmflabs.org