[00:00:47] PROBLEM - Hadoop DataNode on analytics1017 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:00:47] PROBLEM - salt-minion processes on analytics1017 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:00:56] PROBLEM - Disk space on analytics1017 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:01:26] PROBLEM - SSH on analytics1017 is CRITICAL - Socket timeout after 10 seconds [00:01:37] PROBLEM - Hadoop NodeManager on analytics1017 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:01:37] PROBLEM - dhclient process on analytics1017 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:01:56] PROBLEM - DPKG on analytics1017 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:02:06] PROBLEM - configured eth on analytics1017 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:02:07] PROBLEM - puppet last run on analytics1017 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:02:07] PROBLEM - RAID on analytics1017 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:13:47] (03CR) 10Dereckson: "Bug T93798 is about "Give patrol to reviewers on testwiki", your commit is about to give that right to en. and test." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/199321 (https://phabricator.wikimedia.org/T93798) (owner: 10Cenarium) [00:15:15] (03CR) 10Dereckson: "Oh, I see, they currently share the same configuration." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/199321 (https://phabricator.wikimedia.org/T93798) (owner: 10Cenarium) [00:17:56] PROBLEM - Host analytics1017 is DOWN: PING CRITICAL - Packet loss = 100% [00:19:04] i'll check out analytics1017, it does indeed appear to be down [00:20:14] [12553121.825854] CPU17: Core temperature above threshold, cpu clock throttled (total events = 26867694) [00:20:17] [12553121.840802] CPU5: Core temperature above threshold, cpu clock throttled (total events = 26868295) [00:26:57] !log analytics1017 unresponsive, console reported high temps. rebooted. [00:27:04] Logged the message, Master [00:27:06] RECOVERY - SSH on analytics1017 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2 (protocol 2.0) [00:27:16] RECOVERY - Host analytics1017 is UPING OK - Packet loss = 0%, RTA = 0.36 ms [00:27:26] RECOVERY - Hadoop NodeManager on analytics1017 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager [00:27:27] RECOVERY - dhclient process on analytics1017 is OK: PROCS OK: 0 processes with command name dhclient [00:27:37] RECOVERY - DPKG on analytics1017 is OK: All packages OK [00:27:47] RECOVERY - configured eth on analytics1017 is OK - interfaces up [00:27:47] RECOVERY - puppet last run on analytics1017 is OK Puppet is currently enabled, last run 44 minutes ago with 0 failures [00:27:47] RECOVERY - RAID on analytics1017 is OK no disks configured for RAID [00:28:17] RECOVERY - Disk space on analytics1017 is OK: DISK OK [00:28:17] RECOVERY - Hadoop DataNode on analytics1017 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.hdfs.server.datanode.DataNode [00:28:17] RECOVERY - salt-minion processes on analytics1017 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [00:33:58] PROBLEM - DPKG on analytics1017 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [00:35:02] ^ i'm upgrading [00:37:16] RECOVERY - DPKG on analytics1017 is OK: All packages OK [02:22:32] !log l10nupdate Synchronized php-1.25wmf24/cache/l10n: (no message) (duration: 06m 32s) [02:22:43] Logged the message, Master [02:27:23] !log LocalisationUpdate completed (1.25wmf24) at 2015-04-13 02:26:20+00:00 [02:27:31] Logged the message, Master [02:43:12] !log l10nupdate Synchronized php-1.26wmf1/cache/l10n: (no message) (duration: 05m 40s) [02:43:18] Logged the message, Master [02:47:36] !log LocalisationUpdate completed (1.26wmf1) at 2015-04-13 02:46:33+00:00 [02:47:40] Logged the message, Master [03:03:58] (03PS1) 10Dereckson: Set up Babel categories for hu.wikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/203783 [03:05:07] (03PS2) 10Dereckson: Set up Babel categories for hu.wikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/203783 (https://phabricator.wikimedia.org/T94842) [03:37:10] (03PS1) 10Dereckson: Set meta namespace on or.wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/203785 (https://phabricator.wikimedia.org/T94142) [04:26:45] ebernhardson: want some root chown help? [04:26:52] * YuviPanda|zzz just saw the sudo email [04:38:19] YuviPanda|zzz: lol, no its ok :) I was just running something w/ mwscript that i wanted to write to my home dir, so i was trying to ebernhardson.www-data it. i just gave it 777 for now not a bigdeal [04:38:33] heh ok :) [04:38:56] it emails you all when i try to badly sudo something? :) [04:40:13] ebernhardson: yup :P [04:40:31] ebernhardson: https://xkcd.com/838/ [04:40:41] I might or might not have a long white beard and be wearing red [04:40:54] :) [04:58:16] PROBLEM - git.wikimedia.org on antimony is CRITICAL - Socket timeout after 10 seconds [05:05:25] !log LocalisationUpdate ResourceLoader cache refresh completed at Mon Apr 13 05:04:22 UTC 2015 (duration 4m 21s) [05:05:33] Logged the message, Master [05:53:06] RECOVERY - git.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 60617 bytes in 0.790 second response time [05:59:47] PROBLEM - git.wikimedia.org on antimony is CRITICAL - Socket timeout after 10 seconds [06:10:57] RECOVERY - git.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 60617 bytes in 1.075 second response time [06:29:47] PROBLEM - puppet last run on wtp2012 is CRITICAL Puppet has 2 failures [06:30:17] PROBLEM - puppet last run on ruthenium is CRITICAL Puppet has 1 failures [06:30:18] PROBLEM - puppet last run on subra is CRITICAL Puppet has 1 failures [06:30:18] PROBLEM - puppet last run on mw2023 is CRITICAL Puppet has 1 failures [06:31:26] PROBLEM - puppet last run on mw1235 is CRITICAL Puppet has 2 failures [06:34:47] PROBLEM - puppet last run on mw2096 is CRITICAL Puppet has 1 failures [06:35:07] PROBLEM - puppet last run on mw2113 is CRITICAL Puppet has 1 failures [06:35:07] PROBLEM - puppet last run on mw2184 is CRITICAL Puppet has 1 failures [06:36:26] PROBLEM - puppet last run on mw2022 is CRITICAL Puppet has 2 failures [06:44:56] RECOVERY - puppet last run on ruthenium is OK Puppet is currently enabled, last run 7 seconds ago with 0 failures [06:44:57] RECOVERY - puppet last run on mw2023 is OK Puppet is currently enabled, last run 4 seconds ago with 0 failures [06:45:57] RECOVERY - puppet last run on mw1235 is OK Puppet is currently enabled, last run 25 seconds ago with 0 failures [06:46:06] RECOVERY - puppet last run on wtp2012 is OK Puppet is currently enabled, last run 14 seconds ago with 0 failures [06:46:36] RECOVERY - puppet last run on mw2113 is OK Puppet is currently enabled, last run 3 seconds ago with 0 failures [06:46:37] RECOVERY - puppet last run on subra is OK Puppet is currently enabled, last run 54 seconds ago with 0 failures [06:47:47] RECOVERY - puppet last run on mw2096 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:47:47] RECOVERY - puppet last run on mw2022 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:48:08] RECOVERY - puppet last run on mw2184 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:48:15] (03CR) 10Tim Landscheidt: "*argl* Heisenbugs:" [puppet] - 10https://gerrit.wikimedia.org/r/203313 (https://phabricator.wikimedia.org/T88216) (owner: 10Tim Landscheidt) [06:58:48] (03CR) 10Tim Landscheidt: "Complete error after starting nginx and "curl http://toolsbeta-webproxy:8081/list":" [puppet] - 10https://gerrit.wikimedia.org/r/203313 (https://phabricator.wikimedia.org/T88216) (owner: 10Tim Landscheidt) [07:22:20] (03CR) 10Tim Landscheidt: "https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=719519; hasn't been backported to Trusty, filed https://bugs.launchpad.net/ubuntu/+sourc" [puppet] - 10https://gerrit.wikimedia.org/r/203313 (https://phabricator.wikimedia.org/T88216) (owner: 10Tim Landscheidt) [07:32:21] (03CR) 10Tim Landscheidt: "Why did it work previously? toolsbeta-webproxy was a Precise instance where the test went fine; then I replaced it with a Trusty instance" [puppet] - 10https://gerrit.wikimedia.org/r/203313 (https://phabricator.wikimedia.org/T88216) (owner: 10Tim Landscheidt) [07:43:06] PROBLEM - puppet last run on mw2178 is CRITICAL Puppet has 1 failures [07:59:06] RECOVERY - puppet last run on mw2178 is OK Puppet is currently enabled, last run 38 seconds ago with 0 failures [08:04:36] PROBLEM - git.wikimedia.org on antimony is CRITICAL - Socket timeout after 10 seconds [08:10:16] (03PS3) 10Werdna: Make Hovercards default for Chinese, Catalan and Greek WP. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197038 (https://phabricator.wikimedia.org/T88164) [08:34:35] !log restarting stuck Jenkins [08:34:39] Logged the message, Master [08:59:38] !log ms-be10[678] account/container weight to 100 [08:59:44] Logged the message, Master [09:09:37] PROBLEM - puppet last run on cp3008 is CRITICAL puppet fail [09:27:46] RECOVERY - puppet last run on cp3008 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [09:45:10] (03PS1) 10Hashar: labnodepool1001: standard + add CI admins [puppet] - 10https://gerrit.wikimedia.org/r/203798 (https://phabricator.wikimedia.org/T95303) [10:03:32] 7Blocked-on-Operations, 6operations, 10Continuous-Integration, 3Continuous-Integration-Isolation, and 2 others: Create a Debian package for Zuul - https://phabricator.wikimedia.org/T48552#1202415 (10hashar) [10:13:16] RECOVERY - git.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 60617 bytes in 0.451 second response time [10:16:03] (03PS3) 10Giuseppe Lavagetto: Add a generic node service module [puppet] - 10https://gerrit.wikimedia.org/r/203334 (https://phabricator.wikimedia.org/T95533) (owner: 10Mobrovac) [10:16:09] <_joe_> mobrovac: a first pass is ready, I'll continue working on it [10:17:01] cool [10:17:04] _joe_: thnx! [10:21:56] heh, you did quite a rework there :) [10:26:59] <_joe_> mobrovac: yes, mostly naming though [10:27:21] <_joe_> and I moved things that should be common (like the statsd host) to a specific class [10:27:49] <_joe_> because a class can be included in multiple places (and be autoconfigured via automatic hiera lookups [10:27:54] _joe_: why are init.pp and node.pp soooo similar? [10:28:07] <_joe_> because I forgot to remove init.pp [10:28:08] <_joe_> :P [10:29:17] <_joe_> also, I'm sure there are a ton of errors there [10:29:38] <_joe_> I'll fix them progressively [10:30:06] (03PS4) 10Giuseppe Lavagetto: Add a generic node service module [puppet] - 10https://gerrit.wikimedia.org/r/203334 (https://phabricator.wikimedia.org/T95533) (owner: 10Mobrovac) [10:32:24] ok cool, I'll consider you the owner of the patch now :) [10:32:42] _joe_: if I happen to find some issues, I'll comment there [10:37:51] !log ms-be101[678] object weight to 2250 [10:37:57] Logged the message, Master [10:38:21] (03PS5) 10Giuseppe Lavagetto: Add a generic node service module [puppet] - 10https://gerrit.wikimedia.org/r/203334 (https://phabricator.wikimedia.org/T95533) (owner: 10Mobrovac) [10:40:49] (03PS6) 10Giuseppe Lavagetto: Add a generic node service module [puppet] - 10https://gerrit.wikimedia.org/r/203334 (https://phabricator.wikimedia.org/T95533) (owner: 10Mobrovac) [10:53:24] 6operations: Give Google webmaster tools access to jon katz (Read only is fine) - https://phabricator.wikimedia.org/T90980#1202529 (10Aklapper) Ping - who can hand this out to Jon? [10:55:22] (03PS7) 10Giuseppe Lavagetto: Add a generic node service module [puppet] - 10https://gerrit.wikimedia.org/r/203334 (https://phabricator.wikimedia.org/T95533) (owner: 10Mobrovac) [10:55:51] 6operations: Give Google webmaster tools access to jon katz (Read only is fine) - https://phabricator.wikimedia.org/T90980#1202533 (10Jalexander) >>! In T90980#1202529, @Aklapper wrote: > Ping - who can hand this out to Jon? For better or worse it still seems like the best move right now is to give the password... [11:02:43] (03PS8) 10Giuseppe Lavagetto: Add a generic node service module [puppet] - 10https://gerrit.wikimedia.org/r/203334 (https://phabricator.wikimedia.org/T95533) (owner: 10Mobrovac) [11:34:40] 7Puppet: Bashisms in various /bin/sh scripts - https://phabricator.wikimedia.org/T95064#1202588 (10scfc) p:5Lowest>3Low "It is very unlikely that on a Linux system /bin/sh will not be bash :-)" ­­— well, except for that that I ran those tests on, and every other instance in Labs, and probably all Ubuntu and... [11:48:46] PROBLEM - git.wikimedia.org on antimony is CRITICAL - Socket timeout after 10 seconds [12:01:48] RECOVERY - git.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 60617 bytes in 0.741 second response time [12:10:50] 6operations, 10Continuous-Integration: job creation permission on jenkins for WMDE-Fisch - https://phabricator.wikimedia.org/T95546#1202642 (10JanZerebecki) [12:22:53] (03CR) 10Mobrovac: Add a generic node service module (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/203334 (https://phabricator.wikimedia.org/T95533) (owner: 10Mobrovac) [12:25:42] (03PS1) 10Tim Landscheidt: Tools: Puppetize updatetools [puppet] - 10https://gerrit.wikimedia.org/r/203808 (https://phabricator.wikimedia.org/T94858) [12:26:29] (03CR) 10Tim Landscheidt: [C: 04-1] "WIP; just uploaded it here for backup." [puppet] - 10https://gerrit.wikimedia.org/r/203808 (https://phabricator.wikimedia.org/T94858) (owner: 10Tim Landscheidt) [12:30:25] !log bounce statsite on graphite1001 [12:30:29] Logged the message, Master [12:45:57] PROBLEM - git.wikimedia.org on antimony is CRITICAL - Socket timeout after 10 seconds [12:50:16] (03PS9) 10Giuseppe Lavagetto: Add a generic node service module [puppet] - 10https://gerrit.wikimedia.org/r/203334 (https://phabricator.wikimedia.org/T95533) (owner: 10Mobrovac) [12:50:20] (03CR) 10Giuseppe Lavagetto: Add a generic node service module (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/203334 (https://phabricator.wikimedia.org/T95533) (owner: 10Mobrovac) [12:53:03] <_joe_> godog: a few alerts relating to graphite, in particular "carbon-relay queue full [12:55:53] _joe_: thanks, I'll take a look [12:56:00] (03CR) 10Mobrovac: Add a generic node service module (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/203334 (https://phabricator.wikimedia.org/T95533) (owner: 10Mobrovac) [13:02:41] <_joe_> mobrovac: a few configs would change with the change you proposed btw [13:03:27] _joe_: re ferm::service/ [13:03:27] ? [13:03:51] <_joe_> nope, I thought I replied on that, 1 sec [13:04:17] (03CR) 10Giuseppe Lavagetto: Add a generic node service module (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/203334 (https://phabricator.wikimedia.org/T95533) (owner: 10Mobrovac) [13:08:41] (03PS10) 10Giuseppe Lavagetto: Add a generic node service module [puppet] - 10https://gerrit.wikimedia.org/r/203334 (https://phabricator.wikimedia.org/T95533) (owner: 10Mobrovac) [13:09:03] <_joe_> ok this version ^^ should be a no-op for citoid, practically [13:13:03] _joe_: yep, looks like it, modulo changes needed to make it work in beta, https://gerrit.wikimedia.org/r/#/c/203294/2 [13:13:14] i need to add there zotero_port as well now [13:14:21] (03CR) 10Anomie: Flagged revisions configuration on ca.wikinews (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/203283 (https://phabricator.wikimedia.org/T95085) (owner: 10Dereckson) [13:17:19] <_joe_> mobrovac: I'm gonna merge the change, apply it on one host only, then I'll ask you to validate everything's working fine [13:17:38] _joe_: question [13:18:01] in role/citoid.pp, shouldn't it be hiera('role::citoid::zotero_host') [13:18:02] ? [13:18:11] <_joe_> why? [13:18:16] <_joe_> it's a direct lookup [13:18:20] ah ok [13:18:31] <_joe_> we can call the variable "unicorn" if we prefer :) [13:18:43] but role::citoid::zotero_host would not need hiera() [13:18:59] <_joe_> mobrovac: only if it was a class parameter [13:19:05] i see [13:19:07] nm then [13:19:16] <_joe_> and it is recommended not to use class parameters on roles [13:19:23] (03PS1) 10KartikMistry: Beta: Add requested languages for ContentTranslation [puppet] - 10https://gerrit.wikimedia.org/r/203812 [13:19:24] _joe_: go ahead with testing, i'm standing by [13:19:30] <_joe_> ok [13:20:06] (03PS3) 10Mobrovac: Citoid: Change Zotero's host for Beta [puppet] - 10https://gerrit.wikimedia.org/r/203294 (https://phabricator.wikimedia.org/T95616) [13:21:52] (03CR) 10Amire80: [C: 031] Beta: Add requested languages for ContentTranslation [puppet] - 10https://gerrit.wikimedia.org/r/203812 (owner: 10KartikMistry) [13:22:26] (03PS2) 10KartikMistry: Beta: Add requested languages for ContentTranslation [puppet] - 10https://gerrit.wikimedia.org/r/203812 [13:23:50] (03PS3) 10KartikMistry: Beta: Add requested languages for ContentTranslation [puppet] - 10https://gerrit.wikimedia.org/r/203812 [13:24:40] (03CR) 10Giuseppe Lavagetto: [C: 032] "The puppet compiler says this is substantially a noop" [puppet] - 10https://gerrit.wikimedia.org/r/203334 (https://phabricator.wikimedia.org/T95533) (owner: 10Mobrovac) [13:28:02] <_joe_> mobrovac: do we really want to install Notice: /Stage[main]/Packages::Nodejs_legacy/Package[nodejs-legacy]/ensure: current_value purged, should be present (noop [13:28:48] _joe_: either that or have a manual exec ln -s /usr/bin/nodejs /usr/bin/node [13:29:00] the pkg just provides us the softlink [13:29:10] <_joe_> oh ok [13:29:26] src/server.js has the /usr/bin/node shebang so we need it [13:29:38] <_joe_> mobrovac: citoid on sca1001 has been updated [13:29:42] ok, checking [13:29:44] <_joe_> it should work correctly anyway [13:29:46] <_joe_> I am too [13:31:49] _joe_: working! :) [13:32:45] Hi [13:32:51] I can't clean the watchlist from Grillitus (es.wikt) sure is very large... https://es.wiktionary.org/wiki/Especial:EditarSeguimiento/clear it gives me "WIKIMEDIA FOUNDATION Error", Grillitus is my bot [13:32:52] (03PS2) 10BBlack: cache.pp cleanup: class names: s/r::c::varnish::/r::c::/ [puppet] - 10https://gerrit.wikimedia.org/r/203558 [13:32:54] (03PS2) 10BBlack: cache.pp cleanup: ws (some nodes unindent) [puppet] - 10https://gerrit.wikimedia.org/r/203556 [13:32:56] (03PS2) 10BBlack: cache.pp cleanup: -decommed_nodes list (unused?) [puppet] - 10https://gerrit.wikimedia.org/r/203557 [13:32:58] (03PS2) 10BBlack: cache.pp cleanup: ws (classes unindent) [puppet] - 10https://gerrit.wikimedia.org/r/203554 [13:33:00] (03PS2) 10BBlack: cache.pp cleanup: a little formatting [puppet] - 10https://gerrit.wikimedia.org/r/203555 [13:33:02] (03PS2) 10BBlack: cache.pp cleanup: un-nest class defs [puppet] - 10https://gerrit.wikimedia.org/r/203553 [13:33:04] (03PS2) 10BBlack: cache.pp cleanup: various format nits [puppet] - 10https://gerrit.wikimedia.org/r/203562 [13:33:06] (03PS2) 10BBlack: cache.pp cleanup: split to one global class/def per file [puppet] - 10https://gerrit.wikimedia.org/r/203563 [13:33:08] (03PS2) 10BBlack: cache.pp cleanup: qualify localssl helper def as r::c::ssl::local [puppet] - 10https://gerrit.wikimedia.org/r/203560 [13:33:10] (03PS2) 10BBlack: cache.pp cleanup: decompress ssl def usage [puppet] - 10https://gerrit.wikimedia.org/r/203561 [13:33:14] (03PS4) 10Mobrovac: Citoid: Change Zotero's host for Beta [puppet] - 10https://gerrit.wikimedia.org/r/203294 (https://phabricator.wikimedia.org/T95616) [13:33:26] (03Abandoned) 10BBlack: cache.pp cleanup: class names: s/r::c::ssl::/r::c::ssl_/ [puppet] - 10https://gerrit.wikimedia.org/r/203559 (owner: 10BBlack) [13:33:27] <_joe_> bblack: ooooh [13:33:39] Hi, _joe_ [13:33:47] <_joe_> hi! [13:33:59] is there the place to solve my problem? [13:34:01] _joe_: nothing substantive in there yet, it's just no-op stuff to make it a little easier to read before diving into bigger things :) [13:34:05] *is here [13:34:39] Request: GET http://es.wiktionary.org/wiki/Especial:EditarSeguimiento/raw, from 10.64.32.105 via cp1068 cp1068 ([10.64.0.105]:3128), Varnish XID 190670131, [13:34:40] Forwarded for: 200.111.53.98, 10.64.32.105, 10.64.32.105, Error: 503, Service Unavailable at Mon, 13 Apr 2015 13:30:09 GMT [13:34:45] <_joe_> Hprmedina: I can surely try to get what's going on, but I guess it's better if you open a bug on phabricator [13:35:09] Grillitus creates at least 700.000 articles [13:35:35] the watch list sure has that amount... [13:35:49] <_joe_> Hprmedina: ok so I'm pretty sure you are hitting some kind of limitation in our software [13:36:02] <_joe_> open a bug for mediawiki please [13:36:17] true, maybe the solution y delete the records on then table directly [13:36:30] where can I open the bug? [13:36:40] please tellme the url [13:36:47] <_joe_> https://phabricator.wikimedia.org [13:36:57] Hprmedina: hi [13:36:59] thanks [13:37:02] We can delete that server side [13:37:05] hi hoo [13:37:10] _joe_: That's well known since forever [13:37:10] ups [13:37:17] <_joe_> hoo: oh, ok [13:37:29] <_joe_> I never encountered it I guess [13:37:39] <_joe_> bblack: I'll take a look [13:37:42] Occasionally I deleted such watchlists per hand using the batch query maint. script [13:38:16] Hprmedina: You should still open a ticket though for having the watchlist cleared [13:38:23] so that I have something to refer to [13:38:26] I forgive to unmark "Add pages I create and files I upload to my watchlist" :S [13:38:38] _joe_: up through the 2nd-to-last one, I've checked puppet-compiler btw. the last one that splits files seems to confuse it on finding classes, but I'm not sure if I want to merge that anyways. It may just make more-substantive refactors more of a pain for now. [13:38:47] ok, I will open a ticket [13:39:04] <_joe_> bblack: ok [13:39:29] 6operations, 7Graphite: Counts with underscore in name no longer updated since move to statsite (cassandra metrics) - https://phabricator.wikimedia.org/T95627#1202733 (10fgiunchedi) I've proposed a patch upstream and applied it to our statsite package running on graphite1001 so we should have the right values... [13:39:34] thanks, _joe_ and hoo XD [13:39:52] _joe_: but reviews welcome. keep in mind I don't expect the net result of those changes to perfect, just better than how it looked before :) [13:40:12] <_joe_> bblack: that's exactly what we need, I think [13:41:54] (03CR) 10Giuseppe Lavagetto: [C: 031] cache.pp cleanup: class names: s/r::c::varnish::/r::c::/ [puppet] - 10https://gerrit.wikimedia.org/r/203558 (owner: 10BBlack) [13:45:03] 6operations, 10ops-fundraising: fundraising system kernel updates - https://phabricator.wikimedia.org/T95887#1202741 (10Jgreen) 3NEW [13:45:11] 6operations, 10ops-fundraising: fundraising system kernel updates - https://phabricator.wikimedia.org/T95887#1202750 (10Jgreen) p:5Triage>3High [13:45:39] 6operations, 7Graphite: Counters now only provide rates (multiplied by 1000?) - https://phabricator.wikimedia.org/T95703#1202752 (10fgiunchedi) I agree though that having more metrics on counters would be useful, statsite doesn't let you select what metrics to export for extended counters but we could filter t... [13:46:06] (03CR) 10Giuseppe Lavagetto: [C: 031] cache.pp cleanup: qualify localssl helper def as r::c::ssl::local [puppet] - 10https://gerrit.wikimedia.org/r/203560 (owner: 10BBlack) [13:49:02] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "Since this is mostly cosmetic, I have one nitpick" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/203561 (owner: 10BBlack) [13:49:36] 10Ops-Access-Requests, 6operations, 10Continuous-Integration: Add user wmde-fisch to LDAP group wmde - https://phabricator.wikimedia.org/T95546#1202755 (10hashar) [13:50:08] 10Ops-Access-Requests, 6operations, 10Continuous-Integration: Add user wmde-fisch to LDAP group wmde - https://phabricator.wikimedia.org/T95546#1194224 (10hashar) To make ops work easier, I have rephrased the task title and added some context to its description. [13:50:55] _joe_: wouldn't it be nice if gerrit would show a branch in commit dependency order? you're going through them in arbitrary order as a result :/ [13:51:06] <_joe_> bblack: meh [13:51:11] yeah that is lame [13:51:18] <_joe_> bblack: I'm going in "depends on" order, tbh [13:51:20] makes working with depednent patches a nightmare [13:51:22] I think last time I looked, there was an open bug for gerrit that was old heh [13:52:03] _joe_: it starts here: https://gerrit.wikimedia.org/r/#/c/203553/ [13:52:12] ok :) [13:52:30] <_joe_> ok [13:52:32] another way to deal with serie of patches, is to send them to a sandbox branch [13:53:00] get them reviewed there (not sure patches in sandboxes are publicly viewable , but maybe you can adde reviewers) [13:53:04] (03CR) 10Giuseppe Lavagetto: [C: 031] cache.pp cleanup: various format nits (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/203562 (owner: 10BBlack) [13:53:08] then send a merge commit of that sandbox branch to the target branch [13:53:17] so you would review / merge in the sandbox [13:53:26] then +2 / submit the merge commlit [13:53:57] (03CR) 10Giuseppe Lavagetto: [C: 031] "Yes please." [puppet] - 10https://gerrit.wikimedia.org/r/203553 (owner: 10BBlack) [13:54:08] https://www.mediawiki.org/wiki/Gerrit/personal_sandbox [13:54:51] hoo, _joe_ , done: https://phabricator.wikimedia.org/T95888#1202762 [13:55:18] (03CR) 10Giuseppe Lavagetto: [C: 031] "I can't really follow what happened here because of gerrit. If it's a no-op in the compiler, then we're ok." [puppet] - 10https://gerrit.wikimedia.org/r/203554 (owner: 10BBlack) [13:55:50] (03CR) 10Giuseppe Lavagetto: [C: 031] cache.pp cleanup: a little formatting [puppet] - 10https://gerrit.wikimedia.org/r/203555 (owner: 10BBlack) [13:56:08] _joe_: on the two that have "ws" in the title like that (whitespace), yeah gerrit makes a mess, but "git diff -w" shows zero real change. [13:56:45] (03PS2) 10Dereckson: Flagged revisions configuration on ca.wikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/203283 (https://phabricator.wikimedia.org/T95085) [13:57:17] Hprmedina: :) Will take care of that sometime today [13:57:25] no rush XD [13:57:28] thanks [13:58:31] (03CR) 10Dereckson: "PS2: rebased, updated whitespaces" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/203283 (https://phabricator.wikimedia.org/T95085) (owner: 10Dereckson) [13:58:36] (03CR) 10Giuseppe Lavagetto: [C: 031] cache.pp cleanup: ws (some nodes unindent) [puppet] - 10https://gerrit.wikimedia.org/r/203556 (owner: 10BBlack) [13:58:53] (03CR) 10Dereckson: Flagged revisions configuration on ca.wikinews (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/203283 (https://phabricator.wikimedia.org/T95085) (owner: 10Dereckson) [13:59:08] nice tool, phabricator, my first time using it :P [14:00:32] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "role::cache::configuration::decommissioned_nodes is referenced repeatedly in the torrus module, so I'd probably investigate that first." [puppet] - 10https://gerrit.wikimedia.org/r/203557 (owner: 10BBlack) [14:01:13] oh yeah I forgot about torrus. there are multiple cleanup problems for cache.pp-vs-torrus I think, I just started ignoring them until I can ping alex about what it is. [14:01:28] <_joe_> ahah ok [14:01:45] good catch though, I would've forgotten by now :) [14:02:25] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "In order for this to work, you'd have to add an "import('role/cache/*.pp')" to site.pp, right where roles are imported nowadays." [puppet] - 10https://gerrit.wikimedia.org/r/203563 (owner: 10BBlack) [14:02:47] ah that explains why it can't find anything in the compiler for that one! [14:03:53] _joe_: any general thoughts on whether to split it up at all or not? I'm kinda on the fence about that one. (mostly because I think a lot more real refactoring will happen soon, and splitting it now might just make that more of a PITA) [14:04:32] <_joe_> bblack: I want to split it up, but when we do that, we should also move it to modules/role/ so that autoloading works [14:05:01] modules/role/ ? :) [14:05:14] <_joe_> bblack: can I amend 203563 ? [14:05:33] yeah [14:05:35] <_joe_> bblack: well, autoloading works in puppet only if classes are under modules [14:05:36] PROBLEM - puppet last run on ms-be1013 is CRITICAL Puppet has 1 failures [14:05:39] well [14:05:42] actually don't yet [14:05:49] <_joe_> ok [14:06:08] let me fix the other bits first, it's a PITA to re-base changes into the splitting one [14:06:14] <_joe_> I wanted to add a role/cache.pp that would only import all the manifests we need [14:06:17] <_joe_> yes [14:06:19] <_joe_> right [14:06:28] <_joe_> I'm gonna brew coffee [14:07:07] RECOVERY - git.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 60615 bytes in 0.780 second response time [14:07:07] (03PS1) 10Ottomata: Restrict file persmissions on eventlogging mysql consumer [puppet] - 10https://gerrit.wikimedia.org/r/203815 [14:07:56] (03CR) 10Milimetric: [C: 031] Restrict file persmissions on eventlogging mysql consumer [puppet] - 10https://gerrit.wikimedia.org/r/203815 (owner: 10Ottomata) [14:20:11] (03CR) 10Ottomata: [C: 032] Restrict file persmissions on eventlogging mysql consumer [puppet] - 10https://gerrit.wikimedia.org/r/203815 (owner: 10Ottomata) [14:20:50] 6operations, 10Continuous-Integration: Provide lint for yaml files in operations repository - https://phabricator.wikimedia.org/T91496#1202809 (10hashar) 5Open>3declined a:3hashar Per my previous comment, the yaml linting should be done by a test suite in the operations/puppet.git repository. CI would in... [14:22:06] RECOVERY - puppet last run on ms-be1013 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [14:24:27] mobrovac: going to merge https://gerrit.wikimedia.org/r/#/c/203294/4 [14:26:25] (03PS5) 10Filippo Giunchedi: Citoid: Change Zotero's host for Beta [puppet] - 10https://gerrit.wikimedia.org/r/203294 (https://phabricator.wikimedia.org/T95616) (owner: 10Mobrovac) [14:26:31] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] Citoid: Change Zotero's host for Beta [puppet] - 10https://gerrit.wikimedia.org/r/203294 (https://phabricator.wikimedia.org/T95616) (owner: 10Mobrovac) [14:29:21] (03CR) 10Anomie: Flagged revisions configuration on ca.wikinews (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/203283 (https://phabricator.wikimedia.org/T95085) (owner: 10Dereckson) [14:29:31] _joe_: would you rather role/cache.pp with imports, or just move it all modules/role/cache/ ? [14:29:56] <_joe_> I dunno, tbh [14:30:01] (03PS3) 10BBlack: cache.pp cleanup: class names: s/r::c::varnish::/r::c::/ [puppet] - 10https://gerrit.wikimedia.org/r/203558 [14:30:03] (03PS3) 10BBlack: cache.pp cleanup: ws (some nodes unindent) [puppet] - 10https://gerrit.wikimedia.org/r/203556 [14:30:05] (03PS3) 10BBlack: cache.pp cleanup: -decommed_nodes list (unused?) [puppet] - 10https://gerrit.wikimedia.org/r/203557 [14:30:07] (03PS3) 10BBlack: cache.pp cleanup: ws (classes unindent) [puppet] - 10https://gerrit.wikimedia.org/r/203554 [14:30:09] (03PS3) 10BBlack: cache.pp cleanup: a little formatting [puppet] - 10https://gerrit.wikimedia.org/r/203555 [14:30:11] (03PS3) 10BBlack: cache.pp cleanup: un-nest class defs [puppet] - 10https://gerrit.wikimedia.org/r/203553 [14:30:13] (03PS3) 10BBlack: cache.pp cleanup: various format nits [puppet] - 10https://gerrit.wikimedia.org/r/203562 [14:30:15] (03PS3) 10BBlack: cache.pp cleanup: split to one global class/def per file [puppet] - 10https://gerrit.wikimedia.org/r/203563 [14:30:17] (03PS3) 10BBlack: cache.pp cleanup: qualify localssl helper def as r::c::ssl::local [puppet] - 10https://gerrit.wikimedia.org/r/203560 [14:30:19] (03PS3) 10BBlack: cache.pp cleanup: decompress ssl def usage [puppet] - 10https://gerrit.wikimedia.org/r/203561 [14:30:43] akosiaris: ping? [14:30:58] <_joe_> probably I'd go with the latter, because we won't load all the role* classes for _every_ node but just the ones that include them [14:31:13] <_joe_> bblack: alex is on easter vacation today, as well as faidon [14:31:47] oh, right [14:32:10] anyways, ignore the PS3 uploads above, something got mixed up in rebasing somewhere :) [14:33:24] oh no, it didn't. the patches are ok, other than nothing done about the splitting one yet. gerrit just confused me. [14:33:55] and I still need to ping alex about torrus since it seems to be his. for now I just gutted related things in there in the patch. [14:35:36] (03PS4) 10BBlack: cache.pp cleanup: split one def per file, move to module loader [puppet] - 10https://gerrit.wikimedia.org/r/203563 [14:35:56] <_joe_> bblack: ah! we have one more gotcha there [14:37:01] http://torrus.wikimedia.org/torrus/CDN?path=/Varnish/eqiad/upload/cp1071.eqiad.wmnet/frontend/Cache_performance/Cache_hit_ratio&view=longterm-rrd-html [14:37:15] I never even knew that existed. looks useful, but graphs area dead :/ [14:41:19] <_joe_> bblack: the @monitoring::group definitions you had at the start of the role [14:41:31] <_joe_> bblack: they work only if everything is imported everywhere [14:41:53] <_joe_> in autoload layout, you'd need to use @@ and collect them in the icinga role [14:41:56] <_joe_> I guess [14:42:29] I could leave them behind in manifests/role/cache.pp for now and deal with it later [14:43:13] (03PS1) 10Filippo Giunchedi: varnishkafka: fix spurious UNKNOWN alert [puppet] - 10https://gerrit.wikimedia.org/r/203829 (https://phabricator.wikimedia.org/T90111) [14:45:02] I'm gonna re-arrange the torrus-affecting commit to the end anyways, so that I can merge the rest of it up [14:46:03] 6operations, 10ops-fundraising, 7network: network setup for beryllium.frack.eqiad.wmnet - https://phabricator.wikimedia.org/T95893#1202933 (10Jgreen) 3NEW [14:47:08] 6operations, 10ops-fundraising, 7network: network setup for beryllium.frack.eqiad.wmnet - https://phabricator.wikimedia.org/T95893#1202951 (10Jgreen) references: T95617 and T95620 [14:50:12] marktraceur, ^d, thcipriani: Who wants to SWAT this morning? [14:50:31] Dereckson, jdlrobson: Ping for SWAT in 10 minutes. [14:51:17] godog: are you planning to merge https://gerrit.wikimedia.org/r/203829 now? [14:51:38] bblack: asap yeah, can do straight away if you want to rebase [14:51:47] yeah go for it [14:51:58] * Dereckson is there. [14:52:01] Hello. [14:52:13] <_joe_> bblack: yeah seems like a good option [14:52:18] (03PS2) 10Filippo Giunchedi: varnishkafka: fix spurious UNKNOWN alert [puppet] - 10https://gerrit.wikimedia.org/r/203829 (https://phabricator.wikimedia.org/T90111) [14:52:26] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] varnishkafka: fix spurious UNKNOWN alert [puppet] - 10https://gerrit.wikimedia.org/r/203829 (https://phabricator.wikimedia.org/T90111) (owner: 10Filippo Giunchedi) [14:52:31] <_joe_> we can deal with that at a later time because I have no magic solution for it [14:52:51] bblack: merged [14:53:21] anomie: doesn't look like I have permission to view jdlrobson 's patch :\ [14:53:48] thcipriani: I can copy it to tin for you [14:54:13] kk, that'd be good, I can swat then [14:58:31] Dereckson: just want to confirm that you wanted an array union in gerrit 203283 for $wgGroupPermissions['reviewer']? [14:59:10] 6operations, 10ops-eqiad: mw1031 has a bad uplink - https://phabricator.wikimedia.org/T95896#1202993 (10Joe) 3NEW [14:59:11] thcipriani: yes, we added rights. [14:59:54] kk, any particular order for these patches? [15:00:05] manybubbles, anomie, ^d, thcipriani: Dear anthropoid, the time has come. Please deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150413T1500). [15:00:22] 199923, 199927 together ; no preference for the extra ones [15:00:35] <_joe_> !log depooled mw1031 [15:00:42] Logged the message, Master [15:01:14] (03CR) 10Thcipriani: [C: 032] Create 'Portal' and 'Portal_Discussió' namespaces at cawikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/199923 (https://phabricator.wikimedia.org/T93811) (owner: 10Gerardduenas) [15:01:23] (03Merged) 10jenkins-bot: Create 'Portal' and 'Portal_Discussió' namespaces at cawikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/199923 (https://phabricator.wikimedia.org/T93811) (owner: 10Gerardduenas) [15:01:35] (03CR) 10Thcipriani: [C: 032] Add import sources for cawikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/199927 (https://phabricator.wikimedia.org/T93750) (owner: 10Gerardduenas) [15:01:44] (03Merged) 10jenkins-bot: Add import sources for cawikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/199927 (https://phabricator.wikimedia.org/T93750) (owner: 10Gerardduenas) [15:04:43] (03PS4) 10BBlack: cache.pp cleanup: class names: s/r::c::varnish::/r::c::/ [puppet] - 10https://gerrit.wikimedia.org/r/203558 [15:04:45] (03PS4) 10BBlack: cache.pp cleanup: ws (some nodes unindent) [puppet] - 10https://gerrit.wikimedia.org/r/203556 [15:04:47] (03PS4) 10BBlack: cache config: remove decommed nodes list [puppet] - 10https://gerrit.wikimedia.org/r/203557 [15:04:49] (03PS4) 10BBlack: cache.pp cleanup: ws (classes unindent) [puppet] - 10https://gerrit.wikimedia.org/r/203554 [15:04:51] (03PS4) 10BBlack: cache.pp cleanup: a little formatting [puppet] - 10https://gerrit.wikimedia.org/r/203555 [15:04:53] (03PS4) 10BBlack: cache.pp cleanup: un-nest class defs [puppet] - 10https://gerrit.wikimedia.org/r/203553 [15:04:55] (03PS4) 10BBlack: cache.pp cleanup: various format nits [puppet] - 10https://gerrit.wikimedia.org/r/203562 [15:04:57] (03PS5) 10BBlack: cache.pp cleanup: split one def per file, move to module loader [puppet] - 10https://gerrit.wikimedia.org/r/203563 [15:04:59] (03PS4) 10BBlack: cache.pp cleanup: qualify localssl helper def as r::c::ssl::local [puppet] - 10https://gerrit.wikimedia.org/r/203560 [15:05:01] (03PS4) 10BBlack: cache.pp cleanup: decompress ssl def usage [puppet] - 10https://gerrit.wikimedia.org/r/203561 [15:06:14] !log thcipriani Synchronized wmf-config/InitialiseSettings.php: Morning swat [[gerrit|199927]] and [[gerrit|199923]] (duration: 00m 11s) [15:06:17] Logged the message, Master [15:06:39] 199923 verified through https://ca.wikibooks.org/w/api.php?action=query&meta=siteinfo&siprop=namespaces%7Cnamespacealiases - it works [15:06:39] ^ Dereckson 23 and 27 went, everything look ok? [15:06:42] kk [15:06:47] going ahead with the others [15:06:49] _joe_: unless you have any other objections, gonna merge up everything but what are now the final two (file split, -decommed_nodes bit). I think your only other thing within those was the one about the sni_cert decompress, and I just undid that part for now instead of switching the array thing [15:07:11] <_joe_> +1 [15:07:23] (03CR) 10Thcipriani: [C: 032] Added *.adlibhosting.com to wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/202736 (https://phabricator.wikimedia.org/T95418) (owner: 10Dereckson) [15:07:31] (03Merged) 10jenkins-bot: Added *.adlibhosting.com to wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/202736 (https://phabricator.wikimedia.org/T95418) (owner: 10Dereckson) [15:07:58] hm, godog, not sure, but wouldnt'i t be better to have an UNKNOWN alert because of missing data, rather than a criticalish alert fired because now we are reading 0s? [15:08:05] like, if this was an alert on data throughput [15:08:15] oh pssh [15:08:21] well, this is an alert on error counts [15:08:26] hm. [15:08:30] (03CR) 10BBlack: [C: 032] cache.pp cleanup: un-nest class defs [puppet] - 10https://gerrit.wikimedia.org/r/203553 (owner: 10BBlack) [15:08:43] *also* I've misplaced a ) there :( will wait for bblack to merge and rebase mine [15:08:50] (03CR) 10BBlack: [C: 032] cache.pp cleanup: ws (classes unindent) [puppet] - 10https://gerrit.wikimedia.org/r/203554 (owner: 10BBlack) [15:09:01] (03CR) 10BBlack: [C: 032] cache.pp cleanup: a little formatting [puppet] - 10https://gerrit.wikimedia.org/r/203555 (owner: 10BBlack) [15:09:13] (03CR) 10BBlack: [C: 032] cache.pp cleanup: ws (some nodes unindent) [puppet] - 10https://gerrit.wikimedia.org/r/203556 (owner: 10BBlack) [15:09:16] !log thcipriani Synchronized wmf-config/InitialiseSettings.php: Morning swat [[gerrit|202736]] (duration: 00m 11s) [15:09:19] Logged the message, Master [15:09:42] godog: ok I won't palladium-merge at the end [15:09:59] (03CR) 10BBlack: [C: 032] cache.pp cleanup: class names: s/r::c::varnish::/r::c::/ [puppet] - 10https://gerrit.wikimedia.org/r/203558 (owner: 10BBlack) [15:10:05] bblack: kk, review coming up [15:10:09] (03CR) 10BBlack: [C: 032] cache.pp cleanup: qualify localssl helper def as r::c::ssl::local [puppet] - 10https://gerrit.wikimedia.org/r/203560 (owner: 10BBlack) [15:10:14] ottomata: ye, sec [15:10:21] (03CR) 10BBlack: [C: 032] cache.pp cleanup: decompress ssl def usage [puppet] - 10https://gerrit.wikimedia.org/r/203561 (owner: 10BBlack) [15:10:31] (03CR) 10BBlack: [C: 032] cache.pp cleanup: various format nits [puppet] - 10https://gerrit.wikimedia.org/r/203562 (owner: 10BBlack) [15:10:33] Dereckson: 202736 look fine? [15:10:42] godog: that's all I'm merging for now, rebase on that. [15:10:55] (a Commons admin is testing a 202736 upload right now) [15:11:25] neat, thanks :) [15:12:34] (03PS1) 10Filippo Giunchedi: varnishkafka: fix misplaced parenthesis [puppet] - 10https://gerrit.wikimedia.org/r/203831 [15:12:55] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] varnishkafka: fix misplaced parenthesis [puppet] - 10https://gerrit.wikimedia.org/r/203831 (owner: 10Filippo Giunchedi) [15:13:00] 202736 ok <-- 15:12:45 < rama> je confirme que ça marche [15:13:01] bblack: good to puppet-merge [15:13:39] * bblack ducks under the table to hide from what will surely be a torrent of puppetfails [15:14:04] (03CR) 10Thcipriani: [C: 032] Enable SandboxLink extension on fr.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/203279 (https://phabricator.wikimedia.org/T95604) (owner: 10Dereckson) [15:14:10] ottomata: I think in this case with zero it is fine, we could increase the time window instead but that decreases the chances really [15:14:44] (03Merged) 10jenkins-bot: Enable SandboxLink extension on fr.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/203279 (https://phabricator.wikimedia.org/T95604) (owner: 10Dereckson) [15:17:11] !log thcipriani Synchronized wmf-config/InitialiseSettings.php: Morning swat [[gerrit|203279]] (duration: 00m 14s) [15:17:16] Logged the message, Master [15:18:57] 203279 seems ok [15:20:06] Dereckson: thanks, any whitespace updates to 203283 before merge? [15:20:31] PROBLEM - git.wikimedia.org on antimony is CRITICAL - Socket timeout after 10 seconds [15:21:49] thcipriani: no, PS2 fixed the whitespace correctly [15:21:59] kk, going with that one then [15:21:59] oh read the anomie comment [15:22:26] (03CR) 10Thcipriani: [C: 032] Flagged revisions configuration on ca.wikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/203283 (https://phabricator.wikimedia.org/T95085) (owner: 10Dereckson) [15:22:29] anomie: we have quite a lot of configuration files using vertical alignment for bugs [15:22:32] (03PS5) 10BBlack: cache config: remove decommed nodes list [puppet] - 10https://gerrit.wikimedia.org/r/203557 [15:22:34] (03PS6) 10BBlack: cache.pp cleanup: split one def per file, move to module loader [puppet] - 10https://gerrit.wikimedia.org/r/203563 [15:22:36] (03Merged) 10jenkins-bot: Flagged revisions configuration on ca.wikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/203283 (https://phabricator.wikimedia.org/T95085) (owner: 10Dereckson) [15:22:42] anomie: you wish we plan a general cleanup of these files to hunt them? [15:23:10] Dereckson: I wouldn't necessarily hold up a merge for the whitespace issue there, but it's always good not to add more tech debt. [15:23:15] anomie: for example, all the namespaces sections in InitialiseSettings.php are vertical aligned. [15:24:57] !log thcipriani Synchronized wmf-config/flaggedrevs.php: Morning swat [[gerrit|203283]] (duration: 00m 14s) [15:24:59] fwiw, I tend to think it's clearer to fix whitespace separately from any functional commit [15:25:04] Logged the message, Master [15:25:21] (either do it before, separately, or leave the whitespace broken-but-consistent in the functional commit first and fix it all later) [15:25:53] bblack: FYI, this is about newly-added bad whitespace [15:26:02] ah ok [15:26:12] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There are 9 unmerged changes in puppet (dir /var/lib/git/operations/puppet). [15:26:18] stupid strontium [15:26:19] (03PS4) 10Andrew Bogott: Add mholloway to bastion-only and researchers. [puppet] - 10https://gerrit.wikimedia.org/r/203341 [15:26:19] jdlrobson: ping for T94128 swat [15:26:24] thcipriani: hi! [15:26:37] cool, working on that patch now :) [15:26:49] https://phabricator.wikimedia.org/T95900 created for a general vertical align hunt. [15:26:52] thcipriani: you have access to the bug? [15:27:13] anomie put the patch on tin for me [15:27:17] perfeeectttt let me know any questions [15:27:51] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [15:27:53] one dumb question which versions does this apply to? [15:27:56] (03CR) 10Andrew Bogott: [C: 032] Add mholloway to bastion-only and researchers. [puppet] - 10https://gerrit.wikimedia.org/r/203341 (owner: 10Andrew Bogott) [15:29:02] thcipriani: 1.25wmf24 and 1.26wmf1, presumably [15:29:27] got it, thanks [15:32:09] _joe_: I still get errors on the splitting commit: http://puppet-compiler.wmflabs.org/688/change/203563/compiled/puppet_catalogs_3_203563/cp3015.esams.wmnet.warnings [15:32:12] 6operations, 10Deployment-Systems, 6Services: Evaluate Docker as a container deployment tool - https://phabricator.wikimedia.org/T93439#1203111 (10GWicke) [15:32:37] but I think that's just the compiler confused by new autoloader files? I know I expect the puppetmasters to be confused until they're restarted, based on past experience :/ [15:32:43] <_joe_> bblack: lemme look at the patch better [15:33:27] 6operations, 7HHVM: Complete the use of HHVM over Zend PHP on the Wikimedia cluster - https://phabricator.wikimedia.org/T86081#1203115 (10chasemp) >>! In T86081#1198014, @Krenair wrote: > @chasemp: It's not clear to me what James can really be expected to do here? This is a task for an ops engineer. Assigne... [15:34:02] oh duh [15:34:07] ok, scap incoming [15:34:08] I should move those to modules/role/manifests/ [15:34:21] not modules/role/cache/foo.pp [15:34:45] _joe_: I think it's that ^ [15:34:48] !log thcipriani Started scap: Morning swat for T94128 [15:34:55] Logged the message, Master [15:34:55] <_joe_> bblack: yes! [15:35:08] <_joe_> modules/role/manifests/cache/foo.php [15:35:13] <_joe_> eh, lapsus [15:36:18] Thank you for the deployment thcipriani. [15:36:28] (03PS6) 10BBlack: cache config: remove decommed nodes list [puppet] - 10https://gerrit.wikimedia.org/r/203557 [15:36:30] (03PS7) 10BBlack: cache.pp cleanup: split one def per file, move to module loader [puppet] - 10https://gerrit.wikimedia.org/r/203563 [15:36:34] Dereckson: yw, thanks for making it easy :) [15:37:14] 10Ops-Access-Requests, 6operations: Requesting stat1003 access for mholloway - https://phabricator.wikimedia.org/T95506#1203134 (10Andrew) @Mhalloway, you should have access now. Please verify and then close this ticket if all is well. [15:38:58] (03CR) 10coren: [C: 032] "Very simple change." [puppet] - 10https://gerrit.wikimedia.org/r/203384 (https://phabricator.wikimedia.org/T95555) (owner: 10coren) [15:41:40] 10Ops-Access-Requests, 6operations: Requesting stat1003 access for mholloway - https://phabricator.wikimedia.org/T95506#1203179 (10Andrew) [15:42:17] 10Ops-Access-Requests, 6operations, 10Analytics-Cluster, 5Patch-For-Review: Requesting access to analytics-users (stat1002) for Jkatz - https://phabricator.wikimedia.org/T94939#1203191 (10Andrew) [15:42:38] 10Ops-Access-Requests, 6operations, 10MediaWiki-extensions-ContentTranslation, 3LE-Sprint-84, 5Patch-For-Review: Access to stat1003 for Niklas and Kartik - https://phabricator.wikimedia.org/T91625#1203200 (10Andrew) [15:42:40] (03PS7) 10BBlack: cache config: remove decommed nodes list [puppet] - 10https://gerrit.wikimedia.org/r/203557 [15:42:42] (03PS8) 10BBlack: cache.pp cleanup: split one def per file, move to module loader [puppet] - 10https://gerrit.wikimedia.org/r/203563 [15:46:21] (03PS8) 10BBlack: cache config: remove decommed nodes list [puppet] - 10https://gerrit.wikimedia.org/r/203557 [15:46:23] (03PS9) 10BBlack: cache.pp cleanup: split one def per file, move to module loader [puppet] - 10https://gerrit.wikimedia.org/r/203563 [15:46:25] (03PS1) 10Ottomata: Add joal to analytics contactgroup [puppet] - 10https://gerrit.wikimedia.org/r/203838 [15:47:15] anyone know of a magical set of git diff/show flags that will make it easy to review "split the contents of this file into a bunch of other files"? and only show the *real* content changes, but leave out the "moving identical blocks around" part? [15:47:26] like "git diff -w" does for whitespace changes [15:47:38] 10Ops-Access-Reviews, 6operations: eventlogging-roots for nuria - https://phabricator.wikimedia.org/T88823#1203224 (10Andrew) 5Open>3Resolved a:3Andrew Yeah, I misfiled this. Nuria is in eventlogging-roots and eventlogging-admins, so this is clearly resolved. [15:47:39] 10Ops-Access-Requests, 6operations, 5Patch-For-Review: Requesting deployment access for milimetric - https://phabricator.wikimedia.org/T88769#1203227 (10Andrew) [15:48:20] 10Ops-Access-Requests, 6operations: Give joal access to eventlog1001.eqiad.wmnet - https://phabricator.wikimedia.org/T95905#1203231 (10JAllemandou) 3NEW [15:50:01] doing sync-proxies now, FYI [15:50:07] 10Ops-Access-Requests, 6operations, 5Patch-For-Review: Requesting deployment access for milimetric - https://phabricator.wikimedia.org/T88769#1203246 (10Andrew) [15:50:09] 10Ops-Access-Reviews, 6operations: eventlogging-roots for milimetric - https://phabricator.wikimedia.org/T88822#1203243 (10Andrew) 5Open>3Resolved a:3Andrew I misfiled this, but it is now resolved. [15:50:39] (03CR) 10Ottomata: [C: 032] Add joal to analytics contactgroup [puppet] - 10https://gerrit.wikimedia.org/r/203838 (owner: 10Ottomata) [15:50:58] 10Ops-Access-Requests, 6operations, 5Patch-For-Review: Requesting deployment access for nuria - https://phabricator.wikimedia.org/T88760#1203250 (10Andrew) [15:51:23] 10Ops-Access-Requests, 6operations: Provide Toby Negrin and Dan Garry with Google Webmaster Tools access - https://phabricator.wikimedia.org/T85938#1203256 (10Andrew) [15:52:50] (03PS9) 10BBlack: cache config: remove decommed nodes list [puppet] - 10https://gerrit.wikimedia.org/r/203557 [15:52:52] (03PS10) 10BBlack: cache.pp cleanup: split one def per file, move to module loader [puppet] - 10https://gerrit.wikimedia.org/r/203563 [15:56:46] thcipriani: is it safe to merge that patch to master yet? [15:57:01] it's about half way through syncing apaches [15:57:33] awesome [15:57:44] jgage: ready for the baton? [15:57:51] jdlrobson: at this pace, it'll be another 8 or so minutes [15:58:01] (03PS1) 10Filippo Giunchedi: mediawiki: deprecate meters in mwerrors.py [puppet] - 10https://gerrit.wikimedia.org/r/203843 (https://phabricator.wikimedia.org/T90111) [15:58:03] (03PS1) 10Filippo Giunchedi: webperf: deprecate statsd meters [puppet] - 10https://gerrit.wikimedia.org/r/203844 (https://phabricator.wikimedia.org/T90111) [15:58:13] http://puppet-compiler.wmflabs.org/692/change/203563/html/cp1064.eqiad.wmnet.html <- how the hell is ssh::userkey[root] coming up as a diff from my cache.pp split ? [15:58:31] AFAICS, only the ganeti role has ssh::userkey[root] ... hmmmm ... [15:58:50] thcipriani: great [16:01:12] (03CR) 10Alex Monk: [C: 04-1] "Unmerged dependency that needs a rebase" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197038 (https://phabricator.wikimedia.org/T88164) (owner: 10Werdna) [16:02:56] chasemp: when you have 20 minutes to spare, I’d appreciate a second opinion [16:03:08] andrewbogott: I'm about, on what? [16:03:15] host aggregates :) [16:03:39] sure is this for CI things? [16:03:55] It’s not for CI, it just happens to overlap your recent reading on the subject. [16:04:13] I can’t make it work, probably because nova is just broken, but I want you to agree that it’s broken before I despair and hack up some other solution. [16:06:08] jdlrobson: 30 servers left [16:08:37] <_joe_> bbl [16:09:10] ok I'm chalking that up to a false-positive related to private-repo, etc [16:11:40] thcipriani: i have to head into the office soon for a meeting. any close to being done? [16:12:05] rebuilding-cdbs now [16:12:23] another few minutes for that step [16:12:40] :-/ [16:13:04] indeed. [16:16:00] !log thcipriani Finished scap: Morning swat for T94128 (duration: 41m 12s) [16:16:05] Logged the message, Master [16:16:11] horray! jdlrobson ^ [16:17:49] thcipriani: sweet! :D [16:18:17] thcipriani: no errors to report? [16:18:32] jdlrobson: nope, everything ran cleanly [16:18:50] thanks thcipriani! :D gotta run! [16:20:39] (03PS10) 10BBlack: cache config: remove decommed nodes list [puppet] - 10https://gerrit.wikimedia.org/r/203557 [16:20:41] (03PS11) 10BBlack: cache.pp cleanup: split one def per file, move to module loader [puppet] - 10https://gerrit.wikimedia.org/r/203563 [16:23:53] !log disabling puppet on caches for cache.pp-split merge, will be restarting puppetmasters too... [16:23:58] Logged the message, Master [16:24:13] ^ I expect this will cause unavoidable but hopefully-brief icinga puppetrun spam [16:25:56] (03CR) 10BBlack: [C: 032] cache.pp cleanup: split one def per file, move to module loader [puppet] - 10https://gerrit.wikimedia.org/r/203563 (owner: 10BBlack) [16:30:07] PROBLEM - puppet last run on mw1208 is CRITICAL puppet fail [16:30:07] PROBLEM - puppet last run on analytics1038 is CRITICAL puppet fail [16:30:16] PROBLEM - puppet last run on mw1213 is CRITICAL Puppet has 24 failures [16:30:17] PROBLEM - puppet last run on mw1054 is CRITICAL Puppet has 4 failures [16:30:17] PROBLEM - puppet last run on lithium is CRITICAL puppet fail [16:30:27] PROBLEM - puppet last run on db2038 is CRITICAL puppet fail [16:30:27] PROBLEM - puppet last run on mw1114 is CRITICAL Puppet has 38 failures [16:30:37] PROBLEM - puppet last run on virt1001 is CRITICAL puppet fail [16:30:37] PROBLEM - puppet last run on mw1249 is CRITICAL puppet fail [16:30:47] PROBLEM - puppet last run on mw2192 is CRITICAL puppet fail [16:30:47] PROBLEM - puppet last run on mw2184 is CRITICAL Puppet has 9 failures [16:30:56] PROBLEM - puppet last run on mw2017 is CRITICAL Puppet has 1 failures [16:30:57] PROBLEM - puppet last run on antimony is CRITICAL puppet fail [16:30:57] PROBLEM - puppet last run on mw2143 is CRITICAL Puppet has 1 failures [16:31:06] PROBLEM - puppet last run on mw2050 is CRITICAL Puppet has 5 failures [16:31:07] PROBLEM - puppet last run on ms-fe2003 is CRITICAL Puppet has 8 failures [16:31:17] PROBLEM - puppet last run on mw1055 is CRITICAL puppet fail [16:31:17] PROBLEM - puppet last run on mw1211 is CRITICAL Puppet has 32 failures [16:31:28] PROBLEM - puppet last run on db2036 is CRITICAL Puppet has 8 failures [16:31:37] PROBLEM - puppet last run on db1028 is CRITICAL Puppet has 9 failures [16:31:53] * andrewbogott checks ^ [16:31:57] PROBLEM - puppet last run on mw2126 is CRITICAL Puppet has 8 failures [16:31:58] it's me, see above! [16:32:07] ok, nevermind then! [16:32:37] PROBLEM - puppet last run on wtp2012 is CRITICAL Puppet has 5 failures [16:33:02] basically, that's fallout from restarting puppetmasters. I'm doing that because I've found that defining new in-use classes in new files tends to not get picked up for a while by running puppetmaster processes, which tends to cause even worse spam [16:36:13] cmjohnson1: did that drive show up? [16:36:22] um… replacement for labvirt1004 [16:36:26] PROBLEM - puppet last run on mw2095 is CRITICAL Puppet has 9 failures [16:36:30] not yet...but I expect it today [16:36:33] 'k [16:37:07] PROBLEM - puppet last run on mw2056 is CRITICAL Puppet has 3 failures [16:37:27] PROBLEM - puppet last run on mw1177 is CRITICAL Puppet has 4 failures [16:37:34] wow some of those are really really slow [16:37:36] PROBLEM - puppet last run on mw2146 is CRITICAL Puppet has 4 failures [16:38:07] PROBLEM - puppet last run on mw2030 is CRITICAL Puppet has 3 failures [16:38:07] RECOVERY - puppet last run on db2036 is OK Puppet is currently enabled, last run 7 seconds ago with 0 failures [16:38:16] RECOVERY - puppet last run on db1028 is OK Puppet is currently enabled, last run 29 seconds ago with 0 failures [16:38:27] RECOVERY - puppet last run on analytics1038 is OK Puppet is currently enabled, last run 27 seconds ago with 0 failures [16:38:27] PROBLEM - puppet last run on mw2092 is CRITICAL Puppet has 8 failures [16:38:36] RECOVERY - puppet last run on lithium is OK Puppet is currently enabled, last run 53 seconds ago with 0 failures [16:38:46] RECOVERY - puppet last run on mw1114 is OK Puppet is currently enabled, last run 15 seconds ago with 0 failures [16:38:47] RECOVERY - puppet last run on db2038 is OK Puppet is currently enabled, last run 45 seconds ago with 0 failures [16:38:56] RECOVERY - puppet last run on virt1001 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [16:38:57] RECOVERY - puppet last run on mw1249 is OK Puppet is currently enabled, last run 42 seconds ago with 0 failures [16:39:06] re-running the failed ones [16:39:08] RECOVERY - puppet last run on mw2017 is OK Puppet is currently enabled, last run 3 seconds ago with 0 failures [16:39:17] RECOVERY - puppet last run on wtp2012 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [16:39:17] RECOVERY - puppet last run on mw2050 is OK Puppet is currently enabled, last run 4 seconds ago with 0 failures [16:39:27] RECOVERY - puppet last run on ms-fe2003 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [16:39:36] RECOVERY - puppet last run on mw1055 is OK Puppet is currently enabled, last run 50 seconds ago with 0 failures [16:39:36] PROBLEM - puppet last run on mw1129 is CRITICAL Puppet has 12 failures [16:39:37] RECOVERY - puppet last run on mw1211 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [16:39:47] PROBLEM - puppet last run on mw2011 is CRITICAL Puppet has 1 failures [16:40:08] RECOVERY - puppet last run on mw1208 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [16:40:16] RECOVERY - puppet last run on mw1213 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [16:40:17] RECOVERY - puppet last run on mw1054 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [16:40:46] PROBLEM - puppet last run on mw2136 is CRITICAL Puppet has 6 failures [16:40:57] RECOVERY - puppet last run on mw1177 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [16:41:06] RECOVERY - puppet last run on mw2146 is OK Puppet is currently enabled, last run 50 seconds ago with 0 failures [16:41:36] RECOVERY - puppet last run on mw2095 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [16:41:37] RECOVERY - puppet last run on mw2030 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [16:41:56] RECOVERY - puppet last run on mw2092 is OK Puppet is currently enabled, last run 48 seconds ago with 0 failures [16:43:07] RECOVERY - puppet last run on mw1129 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [16:43:17] RECOVERY - puppet last run on mw2011 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [16:44:07] RECOVERY - puppet last run on mw2136 is OK Puppet is currently enabled, last run 10 seconds ago with 0 failures [16:46:26] RECOVERY - git.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 60616 bytes in 0.770 second response time [16:47:07] RECOVERY - puppet last run on mw2126 is OK Puppet is currently enabled, last run 19 seconds ago with 0 failures [16:47:37] RECOVERY - puppet last run on mw2184 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [16:47:46] RECOVERY - puppet last run on antimony is OK Puppet is currently enabled, last run 43 seconds ago with 0 failures [16:47:57] RECOVERY - puppet last run on mw2143 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [16:48:06] (03CR) 10GWicke: Add a generic node service module (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/203334 (https://phabricator.wikimedia.org/T95533) (owner: 10Mobrovac) [16:48:13] 6operations, 10MediaWiki-General-or-Unknown, 10MediaWiki-JobRunner, 7Graphite: jobrunner metrics audit - https://phabricator.wikimedia.org/T95913#1203465 (10fgiunchedi) 3NEW a:3fgiunchedi [16:48:25] !log re-enabling puppet on caches [16:48:31] Logged the message, Master [16:49:07] RECOVERY - puppet last run on mw2056 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [16:49:17] RECOVERY - puppet last run on mw2192 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [16:49:54] 6operations, 10MediaWiki-General-or-Unknown, 10MediaWiki-JobRunner, 7Graphite: jobrunner metrics audit - https://phabricator.wikimedia.org/T95913#1203477 (10fgiunchedi) [16:50:07] PROBLEM - puppet last run on cp4004 is CRITICAL Puppet has 5 failures [16:50:07] PROBLEM - puppet last run on cp3008 is CRITICAL Puppet has 7 failures [16:50:22] really? :P [16:51:47] RECOVERY - puppet last run on cp4004 is OK Puppet is currently enabled, last run 3 seconds ago with 0 failures [16:51:49] oh those are old failures [16:51:51] ok [16:53:28] RECOVERY - puppet last run on cp3008 is OK Puppet is currently enabled, last run 35 seconds ago with 0 failures [16:53:34] all clear on puppetspam and icinga alerts related to above I think [16:55:18] (03CR) 10GWicke: Add a generic node service module (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/203334 (https://phabricator.wikimedia.org/T95533) (owner: 10Mobrovac) [17:04:59] 10Ops-Access-Requests, 6operations, 10Analytics: Grant Sati access to geowiki - https://phabricator.wikimedia.org/T95494#1203559 (10Shouston_WMF) @Ottomata, ultimately I'm simply assisting Winifred Olliff and Katy Love with obtaining this data for them to perform analysis. Perhaps if Joady hasn't gotten back... [17:13:35] 10Ops-Access-Requests, 6operations: Requesting stat1003 access for mholloway - https://phabricator.wikimedia.org/T95506#1203587 (10RobH) 5Open>3Resolved a:3RobH I'm in the camp of 'they'll reopen the access task if it isn't working', and not relying on the user to confirm; so I am resolving this. (Again... [17:13:45] 10Ops-Access-Requests, 6operations: Requesting stat1003 access for mholloway - https://phabricator.wikimedia.org/T95506#1203590 (10RobH) a:5RobH>3None [17:15:27] RECOVERY - Host analytics1020 is UPING OK - Packet loss = 0%, RTA = 1.71 ms [17:15:57] 6operations: reclaim ms1004 back to spares pool (rename to WMF3248) - https://phabricator.wikimedia.org/T86933#1203597 (10RobH) 5stalled>3Resolved This was placed back on spares, but the task resolution neglected. [17:16:37] RECOVERY - SSH on analytics1020 is OK: SSH OK - OpenSSH_5.3 (protocol 2.0) [17:20:20] !log running migratePass0 across all CentralAuth wikis [17:20:24] Logged the message, Master [17:23:54] (03PS1) 10Alex Monk: Follow-up I7cf8a614: Clean up lists of BZ numbers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/203861 [17:24:57] (03CR) 10Alex Monk: "See also I720faa25" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/201910 (owner: 10Alex Monk) [17:25:06] (03CR) 10Glaisher: Follow-up I7cf8a614: Clean up lists of BZ numbers (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/203861 (owner: 10Alex Monk) [17:25:47] (03PS1) 10coren: Labs: Remove idmap dependency on instances [puppet] - 10https://gerrit.wikimedia.org/r/203864 (https://phabricator.wikimedia.org/T95555) [17:26:23] (03CR) 10Alex Monk: Follow-up I7cf8a614: Clean up lists of BZ numbers (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/203861 (owner: 10Alex Monk) [17:27:26] PROBLEM - Host analytics1020 is DOWN: PING CRITICAL - Packet loss = 100% [18:00:58] PROBLEM - High load for whatever reason on labstore1001 is CRITICAL 100.00% of data above the critical threshold [24.0] [18:00:59] PROBLEM - Persistent high iowait on labstore1001 is CRITICAL 100.00% of data above the critical threshold [35.0] [18:01:03] ‘for whatever reason,’ icinga? [18:01:03] andrewbogott: that check sounds like a surly teenager ;D [18:01:04] at least there's a reason [18:01:04] one can always ask for a deeper reason, so it's not like icinga could ever tell you an ultimate reason for anything [18:01:04] 42 [18:01:04] duh. [18:01:04] :P [18:01:04] * robh channels surly teenage rob [18:01:05] teenage rob is more angry, yet has more hair on his head [18:01:05] he should be so happy... [18:01:05] youth is wasted on the youth [18:01:05] https://en.wikipedia.org/wiki/Wasted_Youth_%28American_band%29 [18:01:06] mutante: are there more bands called that? [18:01:06] thats the open source version of youtube reference linking ;D [18:01:06] mutante: you could have rick rolled everyone man [18:01:06] hi Krenair in meetings until 12:30pst..i'll ping you then [18:01:06] Krenair unless its urgent [18:01:06] not urgent [18:01:06] ok [18:01:06] JohnFLewis: yes, disambiguation page [18:01:06] robh: haha [18:01:06] Why do I find it funny there was an American and British band at the same time with the same name :p [18:08:20] JohnFLewis: pop music only introduced namespacing in 1990 [18:08:20] prior to that it was all global scope [18:08:20] ori: ah [18:12:10] 6operations, 7HHVM: Complete the use of HHVM over Zend PHP on the Wikimedia cluster - https://phabricator.wikimedia.org/T86081#1203717 (10Krenair) Still has open blockers, this task shouldn't be closed until everything is defaulting to using HHVM in the wikimedia cluster. [18:12:10] 6operations, 7HHVM: Complete the use of HHVM over Zend PHP on the Wikimedia cluster - https://phabricator.wikimedia.org/T86081#1203719 (10Krenair) silver is also using PHP 5.3, IIRC [18:19:25] (03PS1) 10Andrew Bogott: Explicitly set default_schedule_zone to 'nova' [puppet] - 10https://gerrit.wikimedia.org/r/203872 [18:21:05] RECOVERY - Persistent high iowait on labstore1001 is OK Less than 50.00% above the threshold [25.0] [18:22:17] RECOVERY - High load for whatever reason on labstore1001 is OK Less than 50.00% above the threshold [16.0] [18:24:43] 6operations, 10ops-fundraising: fundraising system kernel updates - https://phabricator.wikimedia.org/T95887#1203766 (10Jgreen) a:3Jgreen [18:24:59] 6operations, 10ops-eqiad, 10Analytics-Cluster: analytics1020 hardware failure - https://phabricator.wikimedia.org/T95263#1203785 (10Cmjohnson) Replacement board sent was refurbished and had an idrac error. Failed to load idrac. Most likely because it was never reset. A new board will be sent tomorrow. [18:25:02] 6operations, 10ops-fundraising: fundraising system kernel updates - https://phabricator.wikimedia.org/T95887#1203786 (10Jgreen) all hosts except db1025, silicon, and barium are done [18:25:05] 6operations, 10ops-fundraising: fundraising system kernel updates - https://phabricator.wikimedia.org/T95887#1203787 (10Jgreen) p:5High>3Normal [18:27:43] (03PS2) 10GWicke: WIP: Set up /api/v1/ entry point for restbase [puppet] - 10https://gerrit.wikimedia.org/r/203871 [18:28:02] 10Ops-Access-Requests, 6operations, 10Continuous-Integration: Add user wmde-fisch to LDAP group wmde - https://phabricator.wikimedia.org/T95546#1203802 (10Dzahn) a:3Dzahn [18:28:32] PROBLEM - High load for whatever reason on labstore1001 is CRITICAL 77.78% of data above the critical threshold [24.0] [18:28:42] (03PS3) 10GWicke: WIP: Set up /api/v1/ entry point for restbase [puppet] - 10https://gerrit.wikimedia.org/r/203871 (https://phabricator.wikimedia.org/T95229) [18:49:53] PROBLEM - Persistent high iowait on labstore1001 is CRITICAL 66.67% of data above the critical threshold [35.0] [18:49:58] RECOVERY - Persistent high iowait on labstore1001 is OK Less than 50.00% above the threshold [25.0] [18:49:58] PROBLEM - Host labstore1001 is DOWN: PING CRITICAL - Packet loss = 100% [18:50:00] RECOVERY - High load for whatever reason on labstore1001 is OK Less than 50.00% above the threshold [16.0] [18:50:00] RECOVERY - Host labstore1001 is UPING OK - Packet loss = 0%, RTA = 4.22 ms [18:50:01] (03CR) 10Ori.livneh: [C: 032 V: 032] Initial commit [software/brrd] - 10https://gerrit.wikimedia.org/r/203869 (owner: 10Ori.livneh) [18:50:01] ori: ^ are you going to make it navtiming specific? [18:50:01] * YuviPanda is ok with that, can probably still steal other things [18:50:01] PROBLEM - configured eth on labstore1001 is CRITICAL: Connection refused by host [18:50:01] PROBLEM - RAID on labstore1001 is CRITICAL: Connection refused by host [18:50:01] PROBLEM - dhclient process on labstore1001 is CRITICAL: Connection refused by host [18:50:01] PROBLEM - puppet last run on labstore1001 is CRITICAL: Connection refused by host [18:50:01] PROBLEM - DPKG on labstore1001 is CRITICAL: Connection refused by host [18:50:01] PROBLEM - Disk space on labstore1001 is CRITICAL: Connection refused by host [18:50:01] PROBLEM - salt-minion processes on labstore1001 is CRITICAL: Connection refused by host [18:50:01] YuviPanda: it's navtiming specific right now, i'm going to make it more generic [18:50:01] ori: cool :) [18:50:01] (03PS1) 10Merlijn van Deen: Tool labs: silence sudo security e-mails [puppet] - 10https://gerrit.wikimedia.org/r/203876 (https://phabricator.wikimedia.org/T95882) [18:50:04] (03CR) 10Yuvipanda: Tool labs: silence sudo security e-mails (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/203876 (https://phabricator.wikimedia.org/T95882) (owner: 10Merlijn van Deen) [18:50:09] RECOVERY - configured eth on labstore1001 is OK - interfaces up [18:50:09] RECOVERY - RAID on labstore1001 is OK optimal, 72 logical, 72 physical [18:50:09] RECOVERY - dhclient process on labstore1001 is OK: PROCS OK: 0 processes with command name dhclient [18:50:09] RECOVERY - puppet last run on labstore1001 is OK Puppet is currently enabled, last run 30 seconds ago with 0 failures [18:50:09] RECOVERY - DPKG on labstore1001 is OK: All packages OK [18:50:09] RECOVERY - Disk space on labstore1001 is OK: DISK OK [18:50:09] RECOVERY - salt-minion processes on labstore1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [18:55:15] godog: hey! [18:55:15] statsite 5958 1.3 0.0 95448 10916 ? Ss Apr10 63:09 /usr/bin/statsite -f /etc/statsite/8125.ini [18:55:15] root@labmon1001:/srv/carbon/whisper# sudo service statsite status [18:55:15] statsite stop/waiting [18:55:15] * YuviPanda isn’t sure what’s happening [18:55:17] (03PS2) 10Merlijn van Deen: Tool labs: silence sudo security e-mails [puppet] - 10https://gerrit.wikimedia.org/r/203876 (https://phabricator.wikimedia.org/T95882) [18:55:17] (03CR) 10Merlijn van Deen: "All tabs now, as /etc/sudoers is." [puppet] - 10https://gerrit.wikimedia.org/r/203876 (https://phabricator.wikimedia.org/T95882) (owner: 10Merlijn van Deen) [18:55:18] (03PS2) 10Dzahn: Apply zotero-admin to SCA [puppet] - 10https://gerrit.wikimedia.org/r/202762 (https://phabricator.wikimedia.org/T95400) (owner: 10Alexandros Kosiaris) [18:55:18] (03CR) 10Dzahn: [C: 032] "approved in ops meeting" [puppet] - 10https://gerrit.wikimedia.org/r/202762 (https://phabricator.wikimedia.org/T95400) (owner: 10Alexandros Kosiaris) [18:55:18] YuviPanda: sigh, yeah that's an artifact of running multiple instances and confusing, use statsitectl [18:55:18] YuviPanda: hum, mailrrelay.pp does use sudo:: so I'm confused why we get emails at all. Well, better wait until nfs is back up, then I'll check what's actually in sudoers [18:55:18] valhallasw`cloud: yeah, and role/labs which is applied in all of labs includes sudo [18:55:18] ah, right [18:55:18] godog: oh, sad :( no default ‘service’ implementation [18:55:19] hello ops! could someone review and merge a small job runner change https://gerrit.wikimedia.org/r/#/c/203379/ ? :) [18:55:20] YuviPanda: I guess I'll have to learn hard puppet stuff then, as I'm not sure setting !mail_no_user is right for the general case [18:59:07] legoktm: Did you talk to Sean about the fact that we're going to be very evil(tm) to the databases? [18:59:07] valhallasw`cloud: puppet is quite fun! not that hard :) [18:59:07] valhallasw`cloud: we can make that happen just for tools [18:59:07] YuviPanda: https://github.com/wikimedia/operations-puppet/blob/production/modules/sudo/manifests/user.pp looks sort of hard :P [18:59:08] hoo: I haven't....how are we going to be very evil? [18:59:08] valhallasw`cloud: we don’t want an user, we just want to have the config set up [18:59:08] YuviPanda: ya, I know [18:59:08] YuviPanda: but the clean way to do that is to add a 'config setting' thing like that user class [18:59:08] hoo: sending an email to the ops list about what we're doing is on my todo list though [18:59:08] or maybe just a sudo::no_email class [18:59:09] legoktm: Ok... I have no idea about the exact amount, but I guess that we're going to do a lot of update (renames) [18:59:09] * updats [18:59:09] * updates, doh [18:59:09] valhallasw`cloud: yeah but for labs we don’t use sudo for users. we use ldap [18:59:09] :D [18:59:09] so we don’t really have sudo::user [18:59:09] on labs [18:59:09] and page moves which trigger watchlist emails and fun stuff ;) [18:59:09] YuviPanda: that's not why I linked it :P [18:59:09] YuviPanda: to change a config setting, there should be a class that /looks like that/ [18:59:09] (03PS4) 10GWicke: WIP: Set up /api/v1/ entry point for restbase [puppet] - 10https://gerrit.wikimedia.org/r/203871 (https://phabricator.wikimedia.org/T95229) [18:59:09] YuviPanda: there is but not what puppet configures, I made a note about fixing that, ideally we wouldn't need multiple instances anymore [18:59:25] YuviPanda: but if we just do it for tool labs, I guess my solution is also OK [19:03:59] valhallasw`cloud: yeah, I’m ok with that. needs testing tho [19:04:00] PROBLEM - High load for whatever reason on labstore1001 is CRITICAL 55.56% of data above the critical threshold [24.0] [19:04:01] (03CR) 10Andrew Bogott: [C: 04-1] "I use this feature for instance migration within labs. I'm pretty sure I don't need it anymore, but please don't merge this until I'm thr" [puppet] - 10https://gerrit.wikimedia.org/r/199936 (owner: 10Chad) [19:04:01] ta [19:04:06] (03PS3) 10Merlijn van Deen: Tool labs: silence sudo security e-mails [puppet] - 10https://gerrit.wikimedia.org/r/203876 (https://phabricator.wikimedia.org/T95882) [19:04:07] (03PS5) 10GWicke: WIP: Set up /api/v1/ entry point for restbase [puppet] - 10https://gerrit.wikimedia.org/r/203871 (https://phabricator.wikimedia.org/T95229) [19:04:34] (03PS2) 10Rush: Explicitly set default_schedule_zone to 'nova' [puppet] - 10https://gerrit.wikimedia.org/r/203872 (owner: 10Andrew Bogott) [19:04:50] (03CR) 10Rush: [C: 031] Explicitly set default_schedule_zone to 'nova' [puppet] - 10https://gerrit.wikimedia.org/r/203872 (owner: 10Andrew Bogott) [19:05:37] (03CR) 10GWicke: [C: 04-1] "Untested & pushed as WIP for discussion of the general approach." [puppet] - 10https://gerrit.wikimedia.org/r/203871 (https://phabricator.wikimedia.org/T95229) (owner: 10GWicke) [19:06:49] (03CR) 10Anomie: ""/api/v1/" seems overly generic when this is actually only restbase. What if we decide to rewrite /api/something/ to /w/api.php at some po" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/203871 (https://phabricator.wikimedia.org/T95229) (owner: 10GWicke) [19:09:14] (03CR) 10GWicke: "@Anomie, that case is discussed in the linked task. One option is to use /api/action/. We *could* call this something like /api/rest/v1/ i" [puppet] - 10https://gerrit.wikimedia.org/r/203871 (https://phabricator.wikimedia.org/T95229) (owner: 10GWicke) [19:09:36] PROBLEM - NTP on labstore1001 is CRITICAL: NTP CRITICAL: Offset unknown [19:10:06] PROBLEM - Persistent high iowait on labstore1001 is CRITICAL 60.00% of data above the critical threshold [35.0] [19:11:18] YuviPanda: how does one go about testing these manifests? :-p [19:11:40] 6operations, 10ops-eqiad, 6Labs: labvirt1004 has a failed disk 1789-Slot 0 Drive Array Disk Drive(s) Not Responding Check cables or replace the following drive(s): Port 1I: Box 1: Bay 1 - https://phabricator.wikimedia.org/T95622#1204203 (10Cmjohnson) The disk has been replaced. The tracking n... [19:11:45] valhallasw`cloud: there is the toolsbeta project! [19:11:50] I shall add you to it [19:11:53] YuviPanda: yes, but then it already has to be merged? [19:12:06] or does it deploy a different branch or something? [19:12:40] 6operations, 10ops-eqiad, 6Labs: labvirt1004 has a failed disk 1789-Slot 0 Drive Array Disk Drive(s) Not Responding Check cables or replace the following drive(s): Port 1I: Box 1: Bay 1 - https://phabricator.wikimedia.org/T95622#1204204 (10Cmjohnson) 5Open>3Resolved [19:13:17] RECOVERY - Persistent high iowait on labstore1001 is OK Less than 50.00% above the threshold [25.0] [20:15:02] Different master you can cherry pick to [20:15:02] valhallasw`cloud: [20:15:02] (03PS1) 10Andrew Bogott: Mark a couple of obsolete settings as obsolete. [puppet] - 10https://gerrit.wikimedia.org/r/203879 [20:15:02] I'm out for lunch I'll poke when back? [20:15:03] valhallasw`cloud: also for this one just put the file on tools-bastion-02 and try to sudo wrong [20:15:04] godog: re https://gerrit.wikimedia.org/r/#/c/196335/ , urandom will bring the patch up to date so maybe set tomorrow for merging it? [20:15:06] PROBLEM - Persistent high iowait on labstore1001 is CRITICAL 77.78% of data above the critical threshold [35.0] [20:15:12] (03PS5) 10devunt: Add Josa extension and deploy to ko.wikipedia.beta.wmflabs.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/203627 (https://phabricator.wikimedia.org/T15712) [20:15:13] PROBLEM - zuul_gearman_service on gallium is CRITICAL: Connection refused [20:15:15] PROBLEM - zuul_service_running on gallium is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/local/bin/zuul-server [20:15:16] ^ Acknowledged: Intentionally terminated until Labs is back up [20:15:16] (03PS6) 10devunt: Add Josa extension and deploy to ko.wikipedia.beta.wmflabs.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/203627 (https://phabricator.wikimedia.org/T15712) [20:15:18] (03PS2) 10Eevans: move cassandra submodule into puppet repo [puppet] - 10https://gerrit.wikimedia.org/r/196335 (https://phabricator.wikimedia.org/T92560) [20:15:21] (03CR) 10Eevans: "Patch 2 is rebased to current operations/puppet/cassandra" [puppet] - 10https://gerrit.wikimedia.org/r/196335 (https://phabricator.wikimedia.org/T92560) (owner: 10Eevans) [20:15:23] (03PS1) 10Mobrovac: service::node: fix the look-up of undefined variables [puppet] - 10https://gerrit.wikimedia.org/r/203886 (https://phabricator.wikimedia.org/T95533) [20:15:24] urandom: anything special to do with that patch other than merge it? [20:15:24] ori: i hope not, I had an awful time with the rebase [20:15:24] PROBLEM - RAID on labstore1001 is CRITICAL: Connection refused by host [20:15:24] PROBLEM - SSH on labstore1001 is CRITICAL: Connection refused [20:15:24] ori: should be good to go [20:15:24] PROBLEM - dhclient process on labstore1001 is CRITICAL: Connection refused by host [20:15:24] PROBLEM - puppet last run on labstore1001 is CRITICAL: Connection refused by host [20:15:24] PROBLEM - DPKG on labstore1001 is CRITICAL: Connection refused by host [20:15:24] PROBLEM - Disk space on labstore1001 is CRITICAL: Connection refused by host [20:15:24] PROBLEM - salt-minion processes on labstore1001 is CRITICAL: Timeout while attempting connection [20:15:24] YuviPanda: right. Not sure how to sudo wrong with my account though :p maybe as tool or something [20:15:24] ori: I guess something should be done with operations/puppet/cassandra so that no one tries to make changes there [20:15:25] urandom: thnx! [20:15:25] YuviPanda: but this can be done when everything is back alive [20:15:25] ori: it's good to go, ops blessed it, godog volunteered to merge it [20:15:25] PROBLEM - configured eth on labstore1001 is CRITICAL: Timeout while attempting connection [20:15:25] urandom was just bringing it up to date [20:15:26] PROBLEM - Host labstore1001 is DOWN: PING CRITICAL - Packet loss = 100% [20:15:26] urandom, mobrovac: should we disable puppet on the cassandra nodes [20:15:26] and then apply it on one, to make sure it's ok, and then re-enable? [20:15:26] !log dbstore1001 s2 delayed replication resumed, T95426 [20:15:26] RECOVERY - SSH on labstore1001 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.4 (protocol 2.0) [20:15:26] RECOVERY - Host labstore1001 is UPING OK - Packet loss = 0%, RTA = 1.88 ms [20:15:26] ori: lets run the compiler first to see if anything changes [20:15:26] (03PS1) 10Hashar: Merge tag 'v0.12.0' [debs/jenkins-debian-glue] - 10https://gerrit.wikimedia.org/r/203888 [20:15:27] gwicke: ori: that should be a noop for puppet [20:15:27] in terms of catalogue and stuff [20:15:27] yes, but lets verify [20:15:27] should :) [20:15:27] * gwicke starts a compiler run [20:15:27] damn, jenkins is down because of labs breakage [20:15:27] gwicke: yep. [20:15:27] oh really [20:15:27] hashar: I stopped Jenkins [20:15:27] It was crashing on the timed out ssh connections [20:15:27] I can restart now, but it will just fail again [20:15:27] springle: thanks! any ideas on how long the dbstore1002 resync will take? [20:15:27] Krinkle: na keep it down :) [20:15:27] Krinkle: if labs is dead, there is not much we can do [20:15:27] yeah [20:15:27] hashar: I can start it but keep Zuul down [20:15:27] because it takes long to start jenkins [20:15:27] legoktm: couple hours. it only needs to touch centralauth [20:15:28] (03CR) 10GWicke: [C: 04-1] "Lets wait with the merge until we have verified that this won't introduce config changes by running it through the puppet compiler. The co" [puppet] - 10https://gerrit.wikimedia.org/r/196335 (https://phabricator.wikimedia.org/T92560) (owner: 10Eevans) [20:15:28] RECOVERY - High load for whatever reason on labstore1001 is OK Less than 50.00% above the threshold [16.0] [20:15:28] RECOVERY - Persistent high iowait on labstore1001 is OK Less than 50.00% above the threshold [25.0] [20:15:31] Krinkle: it takes only a few minutes to start Jenkins, though the web interface takes a while [20:15:31] Krinkle: the german client is initialized and register the jobs before the web interface starts :D So it is sneakily running in the background! [20:15:33] hehe german client [20:15:33] RECOVERY - Disk space on labstore1001 is OK: DISK OK [20:15:33] RECOVERY - salt-minion processes on labstore1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [20:15:34] RECOVERY - configured eth on labstore1001 is OK - interfaces up [20:15:34] RECOVERY - RAID on labstore1001 is OK optimal, 72 logical, 72 physical [20:15:34] RECOVERY - dhclient process on labstore1001 is OK: PROCS OK: 0 processes with command name dhclient [20:15:34] RECOVERY - puppet last run on labstore1001 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [20:15:34] RECOVERY - DPKG on labstore1001 is OK: All packages OK [20:15:34] damn [20:15:34] s/german/Gearman/ [20:15:34] (03PS2) 10Andrew Bogott: Mark a couple of obsolete settings as obsolete. [puppet] - 10https://gerrit.wikimedia.org/r/203879 [20:15:34] (03Abandoned) 10Hashar: Merge tag 'v0.12.0' [debs/jenkins-debian-glue] - 10https://gerrit.wikimedia.org/r/203888 (owner: 10Hashar) [20:18:50] !log demon Synchronized php-1.25wmf24/includes/media/XMP.php: adhocdebug (duration: 00m 12s) [20:18:53] Logged the message, Master [20:19:35] 6operations, 6Labs, 10hardware-requests: eqiad: (6) labs virt nodes - https://phabricator.wikimedia.org/T89752#1204471 (10RobH) 5stalled>3Resolved [20:20:21] !log demon Synchronized php-1.25wmf24/includes/media/XMP.php: rm useless debugging (duration: 00m 15s) [20:20:26] Logged the message, Master [20:24:46] RECOVERY - zuul_service_running on gallium is OK: PROCS OK: 2 processes with regex args ^/usr/bin/python /usr/local/bin/zuul-server [20:26:03] RECOVERY - zuul_gearman_service on gallium is OK: TCP OK - 0.000 second response time on port 4730 [20:26:03] (03CR) 10GWicke: "A run just completed, but shows an error from the submodule -> code change: https://integration.wikimedia.org/ci/job/operations-puppet-cat" [puppet] - 10https://gerrit.wikimedia.org/r/196335 (https://phabricator.wikimedia.org/T92560) (owner: 10Eevans) [20:27:27] (03PS2) 10Rush: labnodepool1001: standard + add CI admins [puppet] - 10https://gerrit.wikimedia.org/r/203798 (https://phabricator.wikimedia.org/T95303) (owner: 10Hashar) [20:27:50] (03CR) 10Rush: [C: 032 V: 032] labnodepool1001: standard + add CI admins [puppet] - 10https://gerrit.wikimedia.org/r/203798 (https://phabricator.wikimedia.org/T95303) (owner: 10Hashar) [20:28:38] Krinkle: Is jenkins functional again? Do we need to re-trigger? [20:28:38] PROBLEM - High load for whatever reason on labstore1001 is CRITICAL 88.89% of data above the critical threshold [24.0] [20:28:38] hoo: Yes. [20:28:38] hoo: Though beta deployment is still down for now. I'll start that after things settle [20:31:04] chasemp: thx :) [20:32:47] 6operations, 3Continuous-Integration-Isolation: Remove hashar root access on to be installed labnodepool1001 - https://phabricator.wikimedia.org/T95303#1204502 (10chasemp) p:5Normal>3Low [20:33:12] 6operations, 3Continuous-Integration-Isolation: Remove hashar root access on to be installed labnodepool1001 - https://phabricator.wikimedia.org/T95303#1186041 (10chasemp) 5Open>3stalled Please don't close this as it is to remind me to ensure this access is revoked as appropriate. [20:33:15] 6operations, 3Continuous-Integration-Isolation: install/deploy labnodepool1001 - https://phabricator.wikimedia.org/T95045#1204509 (10chasemp) [20:34:31] 6operations, 3Continuous-Integration-Isolation: Remove hashar root access on to be installed labnodepool1001 - https://phabricator.wikimedia.org/T95303#1204512 (10hashar) I can confirm the access works just fine. Thanks! [20:34:56] 6operations, 6Phabricator, 6Project-Creators: Create policy projects and convert people projects to open - https://phabricator.wikimedia.org/T90491#1060488 (10Aklapper) >>! In T90491#1151638, @atgo wrote: > prefixing the Project with //acl*// moves it to the top of the list of projects a user is a member of... [20:35:22] hashar: yes good to get thigns going :) I had to pinky swear a blood oath I would remember to remove tho [20:35:33] 6operations, 3Continuous-Integration-Isolation: Remove hashar root access on to be installed labnodepool1001 - https://phabricator.wikimedia.org/T95303#1204538 (10Legoktm) >>! In T95303#1204130, @RobH wrote: > Ops meeting disucssion resulted in approval, with conditions that Chase is aware of. As he will hand... [20:36:52] 6operations, 3Continuous-Integration-Isolation: Remove hashar root access on to be installed labnodepool1001 - https://phabricator.wikimedia.org/T95303#1204540 (10chasemp) >>! In T95303#1204538, @Legoktm wrote: >>>! In T95303#1204130, @RobH wrote: >> Ops meeting disucssion resulted in approval, with conditions... [20:37:14] cmjohnson1: Any idea what this is? [ 8.278694] power_meter ACPI000D:00: Ignoring unsafe software power cap! [20:37:41] For all I know they’ve all been doing that, and I just happened to be looking at virt1004 at the right time. [20:37:52] 6operations, 10Continuous-Integration, 10hardware-requests, 3Continuous-Integration-Isolation: eqiad: (1) allocate server to migrate Zuul server to - https://phabricator.wikimedia.org/T95760#1204548 (10RobH) 'hardware is getting old' is not a valid reasoning. So this cannot be easily upgraded in place, an... [20:40:18] (03PS1) 10Yuvipanda: tools: Switch portgranter range again [puppet] - 10https://gerrit.wikimedia.org/r/203946 [20:40:45] (03PS3) 10Yuvipanda: tools: time out webservice commands after 30s waiting for job [puppet] - 10https://gerrit.wikimedia.org/r/203682 [20:41:03] (03PS2) 10Yuvipanda: tools: Switch portgranter range again [puppet] - 10https://gerrit.wikimedia.org/r/203946 [20:41:24] 6operations, 10Continuous-Integration, 10hardware-requests, 3Continuous-Integration-Isolation: eqiad: (1) allocate server to migrate Zuul server to - https://phabricator.wikimedia.org/T95760#1204574 (10chasemp) >>! In T95760#1204548, @RobH wrote: > 'hardware is getting old' is not a valid reasoning. > > S... [20:41:27] RECOVERY - High load for whatever reason on labstore1001 is OK Less than 50.00% above the threshold [16.0] [20:44:04] 6operations, 10Continuous-Integration, 10hardware-requests, 3Continuous-Integration-Isolation: eqiad: (1) allocate server to migrate Zuul server to - https://phabricator.wikimedia.org/T95760#1204592 (10RobH) Gallium is the following: Single CPU: Intel(R) Xeon(R) CPU X3450 @ 2.67GHz Dual 500GB S... [20:45:04] (03PS3) 10Yuvipanda: tools: Switch portgranter range again [puppet] - 10https://gerrit.wikimedia.org/r/203946 [20:46:47] (03CR) 10Yuvipanda: [C: 032] tools: Switch portgranter range again [puppet] - 10https://gerrit.wikimedia.org/r/203946 (owner: 10Yuvipanda) [20:47:41] (03PS1) 10Ori.livneh: Add brrd module [puppet] - 10https://gerrit.wikimedia.org/r/203948 [20:49:26] (03CR) 10Deskana: "We need the data that this patch generates. That said, I am not competent to review the code. :-)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/203250 (owner: 10Bmansurov) [20:49:37] PROBLEM - High load for whatever reason on labstore1001 is CRITICAL 55.56% of data above the critical threshold [24.0] [20:51:19] (03CR) 10Merlijn van Deen: [C: 031] "I love readable times" [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/203513 (owner: 10Yuvipanda) [20:52:14] 6operations, 10ops-eqiad, 6Labs: labvirt1004 has a failed disk 1789-Slot 0 Drive Array Disk Drive(s) Not Responding Check cables or replace the following drive(s): Port 1I: Box 1: Bay 1 - https://phabricator.wikimedia.org/T95622#1204625 (10Andrew) labvirt1004 is working fine now. Thank you! [20:52:19] (03CR) 10Merlijn van Deen: [C: 032] Use isoformat in datetime logs, rather than asctime [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/203513 (owner: 10Yuvipanda) [20:52:39] (03Merged) 10jenkins-bot: Use isoformat in datetime logs, rather than asctime [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/203513 (owner: 10Yuvipanda) [20:53:48] (03PS2) 10Ori.livneh: Add brrd module [puppet] - 10https://gerrit.wikimedia.org/r/203948 [20:54:52] (03CR) 10Ori.livneh: [C: 032] Add brrd module [puppet] - 10https://gerrit.wikimedia.org/r/203948 (owner: 10Ori.livneh) [20:56:08] (03PS1) 10Ori.livneh: Add 'brrd-run' entry-point [software/brrd] - 10https://gerrit.wikimedia.org/r/203950 [20:56:19] (03CR) 10Ori.livneh: [C: 032 V: 032] Add 'brrd-run' entry-point [software/brrd] - 10https://gerrit.wikimedia.org/r/203950 (owner: 10Ori.livneh) [20:57:23] (03PS1) 10Ori.livneh: Fix typo in template name [puppet] - 10https://gerrit.wikimedia.org/r/203951 [20:57:32] (03CR) 10Ori.livneh: [C: 032 V: 032] Fix typo in template name [puppet] - 10https://gerrit.wikimedia.org/r/203951 (owner: 10Ori.livneh) [20:58:21] PROBLEM - Persistent high iowait on labstore1001 is CRITICAL 66.67% of data above the critical threshold [35.0] [20:58:27] 6operations, 10Continuous-Integration, 10hardware-requests, 3Continuous-Integration-Isolation: eqiad: (1) allocate server to migrate Zuul server to - https://phabricator.wikimedia.org/T95760#1204644 (10RobH) a:3Cmjohnson I'm thinking about allocating system cobalt for this, but I need to assign this task... [20:59:01] 6operations, 10Continuous-Integration, 10hardware-requests, 3Continuous-Integration-Isolation: eqiad: (1) allocate server to migrate Zuul server to - https://phabricator.wikimedia.org/T95760#1204647 (10RobH) p:5High>3Normal @andrew: You set this to high priority, but it seems to be generally not any hi... [21:00:55] 6operations, 10Continuous-Integration, 10hardware-requests, 3Continuous-Integration-Isolation: eqiad: (1) allocate server to migrate Zuul server to - https://phabricator.wikimedia.org/T95760#1204652 (10RobH) a:5Cmjohnson>3RobH [21:01:10] PROBLEM - puppet last run on eventlog1001 is CRITICAL Puppet has 1 failures [21:01:22] 6operations, 10Continuous-Integration, 10hardware-requests, 3Continuous-Integration-Isolation: eqiad: (1) allocate server to migrate Zuul server to - https://phabricator.wikimedia.org/T95760#1204654 (10hashar) Gallium has some SSD disk but the process that makes use of it are moving to some other machines.... [21:02:12] RECOVERY - puppet last run on eventlog1001 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [21:03:43] (03CR) 10Bmansurov: "Deskana, we can pull some data from the beta site. What are you interested in, specifically?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/203250 (owner: 10Bmansurov) [21:06:32] greg-g: can I get a deployment window tomorrow for CentralAuth/SUL updates? [21:07:33] 6operations: Enable Extension:Lockdown on OTRS Wiki - https://phabricator.wikimedia.org/T95954#1204670 (10Mdennis-WMF) 3NEW [21:11:11] RECOVERY - Persistent high iowait on labstore1001 is OK Less than 50.00% above the threshold [25.0] [21:13:48] oh, wow [21:13:51] RECOVERY - High load for whatever reason on labstore1001 is OK Less than 50.00% above the threshold [16.0] [21:13:51] lockdown extension [21:16:31] icinga-wm: WHATEVER MAN [21:22:22] legoktm: take what you need for that stuff [21:22:44] legoktm: consider yourself deputized for the rest of the week/sul finalization work to deploy stuffz [21:23:06] (03PS1) 10Hashar: Merge remote-tracking branch 'wikimedia/upstream' into debian [debs/nodepool] (debian) - 10https://gerrit.wikimedia.org/r/203960 [21:23:08] (03PS1) 10Hashar: Initial Debian packaging [debs/nodepool] (debian) - 10https://gerrit.wikimedia.org/r/203961 [21:23:13] mouahahhaa [21:23:21] :) [21:23:33] legoktm: limited to SUL kthx [21:23:35] :) [21:23:47] * greg-g goes to yet more meetings [21:24:15] valhallasw`cloud: you have root on toolsbeta now [21:24:21] (03CR) 10Hashar: "That is brings the upstream source code in our orphan 'debian' branch. I have picked up a recent commit and adjusted the .gitreview to mat" [debs/nodepool] (debian) - 10https://gerrit.wikimedia.org/r/203960 (owner: 10Hashar) [21:24:27] YuviPanda: ok. don't have time to do testing now, though [21:24:35] valhallasw`cloud: yeah, that’s cool. [21:24:42] valhallasw`cloud: also re: how to test sudo failure when you have sudo yourself [21:24:52] valhallasw`cloud: you sudo to someone else who doesn’t have those privilages :) [21:25:00] YuviPanda: oh, right [21:25:05] valhallasw`cloud: ‘sudo -u www-data /bin/bash’ for example [21:25:07] and then attempt to sudo [21:25:13] YuviPanda: also 'make sure to keep a su shell open when you fool around with sudoers' [21:25:13] ;-) [21:25:18] :D [21:25:25] greg-g: thanks :) [21:25:26] valhallasw`cloud: true. worst case, however, I’ve a root key [21:25:48] valhallasw`cloud: also do it on tools-bastion-02 (-dev replacement) that is still out of rotation [21:25:55] ta [21:26:18] (03PS2) 10Hashar: Initial Debian packaging [debs/nodepool] (debian) - 10https://gerrit.wikimedia.org/r/203961 (https://phabricator.wikimedia.org/T89142) [21:27:19] (03PS4) 10Yuvipanda: tools: time out webservice commands after 30s waiting for job [puppet] - 10https://gerrit.wikimedia.org/r/203682 [21:27:23] (03CR) 10Hashar: "Dependencies are probably off, I have originally created it with some outdated upstream code base :/" [debs/nodepool] (debian) - 10https://gerrit.wikimedia.org/r/203961 (https://phabricator.wikimedia.org/T89142) (owner: 10Hashar) [21:27:27] (03CR) 10Yuvipanda: [C: 032 V: 032] tools: time out webservice commands after 30s waiting for job [puppet] - 10https://gerrit.wikimedia.org/r/203682 (owner: 10Yuvipanda) [21:28:29] 6operations, 10Continuous-Integration, 3Continuous-Integration-Isolation, 5Patch-For-Review, 7Upstream: Create a Debian package for NodePool - https://phabricator.wikimedia.org/T89142#1028174 (10hashar) [21:30:57] YuviPanda: do you have some time to review & merge https://gerrit.wikimedia.org/r/#/c/203379/ ? [21:31:14] * YuviPanda looks [21:32:09] legoktm: I don’t know enough about that part of our infra, sorry. You need _joe_ I think. [21:34:30] 6operations, 10Continuous-Integration, 3Continuous-Integration-Isolation: install/setup/deploy cobalt as replacement for gallium - https://phabricator.wikimedia.org/T95959#1204862 (10RobH) 3NEW a:3RobH [21:34:45] 6operations, 10Continuous-Integration, 10hardware-requests, 3Continuous-Integration-Isolation: eqiad: (1) allocate server to migrate Zuul server to - https://phabricator.wikimedia.org/T95760#1199645 (10RobH) [21:34:48] 6operations, 10Continuous-Integration, 3Continuous-Integration-Isolation: install/setup/deploy cobalt as replacement for gallium - https://phabricator.wikimedia.org/T95959#1204877 (10RobH) [21:35:19] 6operations, 10Continuous-Integration, 3Continuous-Integration-Isolation: install/setup/deploy cobalt as replacement for gallium - https://phabricator.wikimedia.org/T95959#1204862 (10RobH) [21:35:24] 6operations, 10Continuous-Integration, 10hardware-requests, 3Continuous-Integration-Isolation: eqiad: (1) allocate server to migrate Zuul server to - https://phabricator.wikimedia.org/T95760#1204879 (10RobH) 5Open>3Resolved Cobalt is allocated for this task. System setup will proceed on T95959. Resol... [21:45:33] robh: Added alias #hardware-request for you :) [21:45:50] https://phabricator.wikimedia.org/T95760#1204879 [21:45:55] It wasn't rendering before. [21:46:24] https://phabricator.wikimedia.org/project/profile/1014/ [21:47:24] YuviPanda: is https://phabricator.wikimedia.org/T93644 done then? [21:47:35] 6operations, 10Continuous-Integration, 3Continuous-Integration-Isolation: install/setup/deploy cobalt as replacement for gallium - https://phabricator.wikimedia.org/T95959#1204957 (10RobH) @chasemp will be chasing down the network requirements. Cobalt needs to talk to labs hosts, which means it would curren... [21:47:50] Krinkle: eh? [21:47:55] chasemp: yup! :) [21:48:05] chasemp: just closed it. Thanks a lot! \o/ [21:48:47] 6operations, 10Continuous-Integration, 3Continuous-Integration-Isolation: install/setup/deploy cobalt as replacement for gallium - https://phabricator.wikimedia.org/T95959#1204964 (10chasemp) a:5RobH>3chasemp [21:49:10] robh: You mentioned #hardware-request in the message, but the project on Phab is plural. [21:49:17] oh [21:49:22] heh [21:49:46] 6operations, 6Engineering-Community, 6WMF-Legal, 6WMF-NDA: Implement the Volunteer NDA process in Phabricator - https://phabricator.wikimedia.org/T655#1204970 (10chasemp) Process as I understand it currently and as it is approved by legal https://wikitech.wikimedia.org/w/index.php?title=Volunteer_NDA&oldi... [21:54:50] PROBLEM - git.wikimedia.org on antimony is CRITICAL - Socket timeout after 10 seconds [21:56:56] !log restarting gitblot [21:57:06] Logged the message, Master [21:57:06] mutante: gitblot :p [21:57:28] itym gitblat [21:57:36] 6operations, 6Engineering-Community, 6WMF-Legal, 6WMF-NDA: Implement the Volunteer NDA process in Phabricator - https://phabricator.wikimedia.org/T655#1205012 (10chasemp) 5Open>3Resolved >>! In T655#1204970, @chasemp wrote: > Process as I understand it currently and as it is approved by legal > > http... [21:57:36] http://sourceforge.net/p/blootbot/wiki/Home/ [21:57:45] anybody around who knows puppet-compiler02.eqiad.wmflabs ? [21:58:00] 6operations, 10Continuous-Integration, 3Continuous-Integration-Isolation: install/setup/deploy cobalt as replacement for gallium - https://phabricator.wikimedia.org/T95959#1205022 (10RobH) for whoever does the install server update (I didn't do it yet, since we aren't yet certain of the fqdn.) NIC1 Ethernet... [21:58:09] it looks like compiling https://gerrit.wikimedia.org/r/#/c/196335/ will require some manual intervention [21:58:41] 6operations, 10Continuous-Integration, 3Continuous-Integration-Isolation: install/setup/deploy cobalt as replacement for gallium - https://phabricator.wikimedia.org/T95959#1205026 (10RobH) [21:59:31] RECOVERY - git.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 60617 bytes in 0.165 second response time [22:00:21] (03CR) 10GWicke: "I think somebody with access to puppet-compiler02.eqiad.wmflabs will need to manually delete the submodule checkout." [puppet] - 10https://gerrit.wikimedia.org/r/196335 (https://phabricator.wikimedia.org/T92560) (owner: 10Eevans) [22:01:16] thanks hoo, wow 1.371.736 pages o.o [22:01:38] You're welcome :) [22:02:31] And I think that's the highscore :D [22:02:39] XD [22:02:45] I once deleted close to 1M, but never that many [22:03:13] one entry for record guinness :P [22:05:30] mhmhm, how is possible? spanish wiktionary has 834,571 articles [22:05:52] http://es.wiktionary.org/w/api.php?action=query&meta=siteinfo&siprop=statistics&maxlag=5 [22:06:25] Hprmedina: It's one entry for each page and one for the talk [22:06:28] I think [22:06:41] ahhh... XD [22:07:10] 6operations, 10Continuous-Integration, 3Continuous-Integration-Isolation: install/setup/deploy cobalt as replacement for gallium - https://phabricator.wikimedia.org/T95959#1205044 (10hashar) Do we have any 30000 feet network diagrams of our vlan / zones / whatever? That would assist in figuring out how machi... [22:07:37] 6operations, 10Continuous-Integration, 3Continuous-Integration-Isolation: install/setup/deploy cobalt as replacement for gallium - https://phabricator.wikimedia.org/T95959#1205047 (10RobH) [22:07:46] then will be 685,868 pages ... 82% of the wiki... uhhh [22:08:01] 6operations, 7HHVM: Complete the use of HHVM over Zend PHP on the Wikimedia cluster - https://phabricator.wikimedia.org/T86081#1205055 (10Legoktm) [22:09:37] (03PS3) 10Hashar: Initial Debian packaging [debs/nodepool] (debian) - 10https://gerrit.wikimedia.org/r/203961 [22:12:21] (03CR) 10Rush: [C: 031] "should work and for the moment it may be the most sane option" [puppet] - 10https://gerrit.wikimedia.org/r/203969 (owner: 10Andrew Bogott) [22:17:22] (03CR) 10Hashar: "Bumped the changelog version since I have updated the upstream code since previous patchset" [debs/nodepool] (debian) - 10https://gerrit.wikimedia.org/r/203961 (owner: 10Hashar) [22:17:50] (03CR) 10Andrew Bogott: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/203879 (owner: 10Andrew Bogott) [22:18:32] (03PS3) 10Andrew Bogott: Mark a couple of obsolete settings as obsolete. [puppet] - 10https://gerrit.wikimedia.org/r/203879 [22:18:44] (03PS2) 10Andrew Bogott: Add a custom pool-based filter for nova scheduler. [puppet] - 10https://gerrit.wikimedia.org/r/203969 [22:19:03] 6operations: Purge > 90 days stat1002:/a/squid/archive/edits - https://phabricator.wikimedia.org/T92339#1205112 (10kevinator) @Ottomata Do not delete this yet... it may be needed for something else. Please ask me before removing the data. [22:19:12] 6operations: Purge > 90 days stat1002:/a/squid/archive/edits - https://phabricator.wikimedia.org/T92339#1205115 (10kevinator) p:5Normal>3Low [22:22:42] 6operations, 10Continuous-Integration, 3Continuous-Integration-Isolation, 5Patch-For-Review, 7Upstream: Create a Debian package for NodePool - https://phabricator.wikimedia.org/T89142#1205139 (10hashar) I have created the Gerrit repository [[ https://gerrit.wikimedia.org/r/#/admin/projects/operations/deb... [22:24:57] 6operations, 10Continuous-Integration, 3Continuous-Integration-Isolation, 5Patch-For-Review, 7Upstream: Create a Debian package for NodePool on Debian Jessie - https://phabricator.wikimedia.org/T89142#1205146 (10hashar) [22:26:14] (03CR) 10Deskana: "I highly doubt that how users use Beta Labs is in any way indicative of how users use Wikipedia. We really need production data." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/203250 (owner: 10Bmansurov) [22:26:28] 6operations, 10Continuous-Integration, 3Continuous-Integration-Isolation, 5Patch-For-Review, 7Upstream: Create a Debian package for NodePool on Debian Jessie - https://phabricator.wikimedia.org/T89142#1028174 (10hashar) [22:26:41] (03CR) 10Andrew Bogott: [C: 032] Mark a couple of obsolete settings as obsolete. [puppet] - 10https://gerrit.wikimedia.org/r/203879 (owner: 10Andrew Bogott) [22:26:58] (03CR) 10Bmansurov: "Without merging this patch, I don't think we can get production data." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/203250 (owner: 10Bmansurov) [22:27:06] (03CR) 10Andrew Bogott: [C: 032] Add a custom pool-based filter for nova scheduler. [puppet] - 10https://gerrit.wikimedia.org/r/203969 (owner: 10Andrew Bogott) [22:28:11] (03PS17) 10Ori.livneh: Gzip SVGs on back upload varnishes. [puppet] - 10https://gerrit.wikimedia.org/r/108484 (https://bugzilla.wikimedia.org/54291) [22:29:05] bblack: rebased ^ (in case you were impatient) [22:30:31] PROBLEM - puppet last run on virt1000 is CRITICAL Puppet has 1 failures [22:31:02] (03PS1) 10BBlack: Bugfix (I hope) for labs breakage from 3d2fadd5 [puppet] - 10https://gerrit.wikimedia.org/r/203976 [22:33:08] (03CR) 10BBlack: [C: 032] Bugfix (I hope) for labs breakage from 3d2fadd5 [puppet] - 10https://gerrit.wikimedia.org/r/203976 (owner: 10BBlack) [22:33:10] (03PS1) 10Andrew Bogott: Include openstack_version param in the scheduler class [puppet] - 10https://gerrit.wikimedia.org/r/203977 [22:35:54] (03CR) 10Andrew Bogott: [C: 032] Include openstack_version param in the scheduler class [puppet] - 10https://gerrit.wikimedia.org/r/203977 (owner: 10Andrew Bogott) [22:36:16] bblack: shall I merge? [22:37:00] oh, yes, sure [22:37:02] sorry :) [22:37:17] oops now you’re here [22:37:24] merged! [22:38:41] RECOVERY - puppet last run on virt1000 is OK Puppet is currently enabled, last run 23 seconds ago with 0 failures [22:47:08] 6operations, 7HHVM, 7Tracking: Complete the use of HHVM over Zend PHP on the Wikimedia cluster (tracking) - https://phabricator.wikimedia.org/T86081#1205217 (10hashar) [22:47:31] 6operations, 7HHVM, 7Tracking: Complete the use of HHVM over Zend PHP on the Wikimedia cluster (tracking) - https://phabricator.wikimedia.org/T86081#961194 (10hashar) I have make this ticket to be oviously a #tracking task. [22:51:29] (03PS1) 10Andrew Bogott: Simplify virt100x node definitions. [puppet] - 10https://gerrit.wikimedia.org/r/203980 [22:51:31] (03PS1) 10Andrew Bogott: Make labvirt100x into compute nodes. [puppet] - 10https://gerrit.wikimedia.org/r/203981 [22:57:19] (03PS2) 10Andrew Bogott: Make labvirt100x into compute nodes. [puppet] - 10https://gerrit.wikimedia.org/r/203981 [22:57:21] (03PS2) 10Andrew Bogott: Simplify virt100x node definitions. [puppet] - 10https://gerrit.wikimedia.org/r/203980 [22:59:17] (03CR) 10Andrew Bogott: [C: 032] Simplify virt100x node definitions. [puppet] - 10https://gerrit.wikimedia.org/r/203980 (owner: 10Andrew Bogott) [23:00:04] RoanKattouw, ^d, Krenair, kaldari: Dear anthropoid, the time has come. Please deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150413T2300). [23:00:23] (03PS1) 10BryanDavis: Use tox lint yaml and run flake8 [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/203984 (https://phabricator.wikimedia.org/T95894) [23:00:25] (03PS1) 10BryanDavis: Fix PEP-8 style [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/203985 [23:01:41] ^d, you doing that? [23:03:20] PROBLEM - git.wikimedia.org on antimony is CRITICAL - Socket timeout after 10 seconds [23:03:36] (03PS6) 10GWicke: WIP: Set up /api/v1/ entry point for restbase [puppet] - 10https://gerrit.wikimedia.org/r/203871 (https://phabricator.wikimedia.org/T95229) [23:04:34] (03CR) 10Andrew Bogott: [C: 032] Make labvirt100x into compute nodes. [puppet] - 10https://gerrit.wikimedia.org/r/203981 (owner: 10Andrew Bogott) [23:05:08] gitblit :/ mutante [23:05:13] hmm [23:05:21] kaldari, hoo: you there? [23:05:34] yes [23:05:43] yep [23:07:56] Where is ^d... [23:08:40] andrewbogott / jgage ^^ restart gitblit please? :) [23:08:42] (03PS2) 10BryanDavis: Use tox lint yaml and run flake8 [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/203984 (https://phabricator.wikimedia.org/T95894) [23:08:55] JohnFLewis: I can do it if you tell me how [23:09:18] (03CR) 10BBlack: [C: 031] "Sane on all the technical bits re: varnish config" [puppet] - 10https://gerrit.wikimedia.org/r/203871 (https://phabricator.wikimedia.org/T95229) (owner: 10GWicke) [23:09:28] kaldari, hoo: I'm not really able to right now, but you two can both deploy right? [23:09:40] I can, yes [23:11:10] kaldari: Can I do my stuff or do you want first? [23:11:23] andrewbogott: I'd imagine: service gitblit restart on antimony [23:11:46] go ahead [23:11:54] JohnFLewis: is that better? [23:12:16] andrewbogott: nope http://git.wikimedia.org/ [23:12:20] PROBLEM - configured eth on labvirt1004 is CRITICAL: eth1 reporting no carrier. [23:12:22] gitblit takes a few minutes to restart, be aware [23:12:29] It's very heavy [23:12:34] hoo: probably that then :) [23:12:51] PROBLEM - puppet last run on labvirt1004 is CRITICAL Puppet has 2 failures [23:13:33] andrewbogott: also maybe !log it so we can see how annoying this is :) [23:13:59] !log restarted gitblit on antimony [23:14:09] JohnFLewis: seems to be back [23:14:15] JohnFLewis: https://gerrit.wikimedia.org/r/#/c/188480/ :p [23:14:21] RECOVERY - git.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 60620 bytes in 0.163 second response time [23:14:53] That was quite fast, actually [23:14:58] * hoo hides [23:15:04] mutante: we must get rid of it :P [23:15:38] JohnFLewis: replaced by Diffusion? [23:15:38] people say having icinga do this is bad, make YuviPanda|brb convince them shinken is great at this (if even possible) and push it for prod ;) [23:15:48] mutante: I'd imagine so [23:16:04] needs redirects of old urls ? [23:16:28] the plan is to have Phab being this almighty tool sent from the heavens in that it can do everything but have the design team work in one place [23:16:43] * hoo kicks jenkins [23:17:12] PROBLEM - configured eth on labvirt1002 is CRITICAL: eth1 reporting no carrier. [23:17:19] JohnFLewis: switch git.wm to Apache cluster, add redirect magic to phab diffusion pages? :p [23:17:20] PROBLEM - puppet last run on labvirt1002 is CRITICAL Puppet has 2 failures [23:17:45] likes to see those icinga-wm messages in a way because they show the "eth" check is working [23:17:45] handling the redirect would be hard as though [23:17:52] yes [23:18:07] I think we go from gitblit's sense to phab's 'smash your face into the keyboard. done!' system [23:18:23] <^d> Doing huh? [23:19:07] <^d> Krenair: ^? [23:19:41] PROBLEM - puppet last run on labvirt1006 is CRITICAL Puppet has 2 failures [23:20:21] PROBLEM - configured eth on labvirt1006 is CRITICAL: eth1 reporting no carrier. [23:20:59] PROBLEM - configured eth on labvirt1001 is CRITICAL: eth1 reporting no carrier. [23:21:00] PROBLEM - puppet last run on labvirt1001 is CRITICAL Puppet has 2 failures [23:21:04] ^ these labvirtxxxx things are just those hosts stirring to life. [23:21:31] Still waiting for jenkins... [23:22:00] (03PS3) 10BryanDavis: Use tox lint yaml and run flake8 [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/203984 (https://phabricator.wikimedia.org/T95894) [23:22:41] Krenair: Sorry, I never actually answered your question. Yes, I’m fine deploying the mobile stuff myself. [23:22:57] assuming Jenkins ever merges it :P [23:23:10] PROBLEM - nova-compute process on labvirt1006 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/nova-compute [23:23:31] I think I'll force merge... stuff passed jenkins earlier one and is still against HEAD [23:23:39] * earlier on [23:23:39] PROBLEM - nova-compute process on virt1005 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/nova-compute [23:24:00] PROBLEM - nova-compute process on labvirt1001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/nova-compute [23:24:50] PROBLEM - configured eth on labvirt1005 is CRITICAL: eth1 reporting no carrier. [23:25:09] PROBLEM - nova-compute process on labvirt1002 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/nova-compute [23:25:20] PROBLEM - puppet last run on labvirt1005 is CRITICAL Puppet has 2 failures [23:27:13] !log hoo Synchronized php-1.25wmf24/includes/jobqueue/JobSpecification.php: Avoid using local main page title in JobSpecification (duration: 00m 12s) [23:27:16] (I'm back now.) [23:27:38] !log hoo Synchronized php-1.26wmf1/includes/jobqueue/JobSpecification.php: Avoid using local main page title in JobSpecification (duration: 00m 12s) [23:28:13] I think the bot logging stuff is down [23:28:24] * hoo seeks morebots [23:29:28] (03PS1) 10Legoktm: Add 'CentralAuthSULRename' log group [mediawiki-config] - 10https://gerrit.wikimedia.org/r/203992 [23:29:39] PROBLEM - configured eth on labvirt1003 is CRITICAL: eth1 reporting no carrier. [23:30:10] PROBLEM - puppet last run on labvirt1003 is CRITICAL Puppet has 2 failures [23:33:36] Verified my stuff [23:33:41] kaldari: Go ahead [23:33:52] thanks [23:34:13] (03PS7) 10GWicke: WIP: Set up /api/v1/ entry point for restbase [puppet] - 10https://gerrit.wikimedia.org/r/203871 (https://phabricator.wikimedia.org/T95229) [23:35:11] RECOVERY - nova-compute process on labvirt1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/nova-compute [23:38:10] RECOVERY - nova-compute process on virt1005 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/nova-compute [23:39:25] (03PS1) 10Andrew Bogott: Add hiera configs for labvirt1001-1009. [puppet] - 10https://gerrit.wikimedia.org/r/203993 [23:40:14] (03CR) 10BBlack: [C: 031] WIP: Set up /api/v1/ entry point for restbase [puppet] - 10https://gerrit.wikimedia.org/r/203871 (https://phabricator.wikimedia.org/T95229) (owner: 10GWicke) [23:40:46] (03PS2) 10Andrew Bogott: Add hiera configs for labvirt1001-1009. [puppet] - 10https://gerrit.wikimedia.org/r/203993 [23:41:58] (03CR) 10Andrew Bogott: [C: 032] Add hiera configs for labvirt1001-1009. [puppet] - 10https://gerrit.wikimedia.org/r/203993 (owner: 10Andrew Bogott) [23:42:56] RECOVERY - puppet last run on labvirt1001 is OK Puppet is currently enabled, last run 5 seconds ago with 0 failures [23:43:57] PROBLEM - nova-compute process on labvirt1005 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/nova-compute [23:45:07] RECOVERY - puppet last run on labvirt1003 is OK Puppet is currently enabled, last run 43 seconds ago with 0 failures [23:46:10] hoo: still waiting for Jenkins to merge my patch :P It’s been building for 30 minutes now [23:46:47] kaldari: how many server kittens have you dispatched in sacrifice to the jenkins god? [23:47:04] you need at least one for every two thousand lines of code commits. [23:47:56] robh: where am I going to find fresh server kittens at this time of night? [23:48:47] RECOVERY - nova-compute process on labvirt1005 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/nova-compute [23:49:04] Hello all, what is a good place to install and run jouncebot to remind mobile front end folks about deploying their software? Is there any instance that I can use to do so? [23:49:26] PROBLEM - Host labvirt1001 is DOWN: PING CRITICAL - Packet loss = 100% [23:49:37] RECOVERY - nova-compute process on labvirt1002 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/nova-compute [23:49:49] I'm sorry not deploying, but cutting a branch. [23:49:56] RECOVERY - nova-compute process on labvirt1006 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/nova-compute [23:50:07] RECOVERY - puppet last run on labvirt1005 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [23:50:07] RECOVERY - puppet last run on labvirt1002 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [23:50:07] RECOVERY - puppet last run on labvirt1006 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [23:50:20] bmansurov: you mean explicitely not in labs? maybe soon in a VM on the ganeti cluster [23:50:46] mutante: it can be anywhere, I'd like to get notifications on our IRC channel. [23:51:24] mutante: can I set it up at tools-login.wmflabs.org? [23:51:45] bmansurov: Yes it can [23:51:46] bmansurov: in tool labs [23:51:50] yes [23:51:53] 6operations, 10ops-eqiad, 6Labs: labvirt100x boxes 'no carrier' on eth1 - https://phabricator.wikimedia.org/T95973#1205369 (10Andrew) 3NEW a:3Cmjohnson [23:52:00] bmansurov: https://wikitech.wikimedia.org/wiki/Help:Tool_Labs#Configuring_bots_and_tools [23:52:24] mutante, JohnFLewis: thanks. could someone please grant me permissions? [23:52:26] RECOVERY - puppet last run on labvirt1004 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [23:53:07] RECOVERY - Host labvirt1001 is UPING OK - Packet loss = 0%, RTA = 0.98 ms [23:53:23] bmansurov: minding joining #wikimedia-labs ? probably a better channel :) [23:53:35] JohnFLewis: of course [23:54:04] i need to run, sorry, John can help you. thanks John! [23:54:10] mutante: thanks [23:54:48] ACKNOWLEDGEMENT - configured eth on labvirt1001 is CRITICAL: eth1 reporting no carrier. andrew bogott https://phabricator.wikimedia.org/T95973 [23:54:48] ACKNOWLEDGEMENT - configured eth on labvirt1002 is CRITICAL: eth1 reporting no carrier. andrew bogott https://phabricator.wikimedia.org/T95973 [23:54:48] ACKNOWLEDGEMENT - configured eth on labvirt1003 is CRITICAL: eth1 reporting no carrier. andrew bogott https://phabricator.wikimedia.org/T95973 [23:54:48] ACKNOWLEDGEMENT - configured eth on labvirt1004 is CRITICAL: eth1 reporting no carrier. andrew bogott https://phabricator.wikimedia.org/T95973 [23:54:48] ACKNOWLEDGEMENT - configured eth on labvirt1005 is CRITICAL: eth1 reporting no carrier. andrew bogott https://phabricator.wikimedia.org/T95973 [23:54:49] ACKNOWLEDGEMENT - configured eth on labvirt1006 is CRITICAL: eth1 reporting no carrier. andrew bogott https://phabricator.wikimedia.org/T95973