[00:00:24] * Dereckson notes some minutes are required for NewUserMessage to start. [00:23:34] 7Blocked-on-Operations, 6operations, 10Continuous-Integration-Infrastructure, 6Scrum-of-Scrums, 5Patch-For-Review: Jenkins is using php-luasandbox 1.9-1 for zend unit tests; precise should be upgraded to 2.0-8 or equivalent - https://phabricator.wikimedia.org/T88798#1303125 (10hashar) Apparently all the... [00:24:13] PROBLEM - puppet last run on cp3013 is CRITICAL puppet fail [00:30:04] hoo: Respected human, time to deploy Wikidata (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150522T0030). Please do the needful. [00:41:13] RECOVERY - puppet last run on cp3013 is OK Puppet is currently enabled, last run 26 seconds ago with 0 failures [00:45:48] 6operations, 6Security-Team: Production cluster can't access labs cluster - https://phabricator.wikimedia.org/T95714#1303192 (10Dereckson) Let's try to figure something. We could create a project similar to the tool server, but with a restricted scope and tailored to prevent facilities for tools with a missio... [00:53:31] 6operations, 6Security-Team: Production cluster can't access labs cluster - https://phabricator.wikimedia.org/T95714#1303195 (10csteipp) @Dereckson, that sounds like a good idea (assuming ops is ok with it). The only additions I would ask is that we make sure no one sets up a proxy there that forwards request... [01:21:22] PROBLEM - puppet last run on db2044 is CRITICAL Puppet has 1 failures [01:36:23] RECOVERY - puppet last run on db2044 is OK Puppet is currently enabled, last run 17 seconds ago with 0 failures [01:57:13] Change replication to git.wm.o is apparently broken [01:58:07] (03PS1) 10GWicke: Fix labs graphoid IP [puppet] - 10https://gerrit.wikimedia.org/r/212743 (https://phabricator.wikimedia.org/T99885) [02:19:53] PROBLEM - puppet last run on ms-fe1002 is CRITICAL Puppet has 1 failures [02:28:52] (03PS1) 10Gage: zookeeper-cleanup: don't generate cron email for normal operation [puppet/zookeeper] - 10https://gerrit.wikimedia.org/r/212746 [02:32:02] !log l10nupdate Synchronized php-1.26wmf6/cache/l10n: (no message) (duration: 05m 56s) [02:32:13] Logged the message, Master [02:34:54] RECOVERY - puppet last run on ms-fe1002 is OK Puppet is currently enabled, last run 6 seconds ago with 0 failures [02:36:14] PROBLEM - are wikitech and wt-static in sync on silver is CRITICAL: wikitech-static CRIT - wikitech and wikitech-static out of sync (90737s 90000s) [02:36:34] 6operations, 10Wikimedia-Git-or-Gerrit: git.wikimedia.org replication from gerrit stopped or lags - https://phabricator.wikimedia.org/T99990#1303220 (10JanZerebecki) 3NEW [02:36:36] !log LocalisationUpdate completed (1.26wmf6) at 2015-05-22 02:35:33+00:00 [02:36:43] Logged the message, Master [02:38:18] !log hoo Synchronized php-1.26wmf7/extensions/Wikidata/: Update Wikidata from wmf4 to wmf6 branch. (duration: 00m 22s) [02:38:24] Logged the message, Master [02:43:54] !log hoo Synchronized php-1.26wmf6/extensions/Wikidata/: Update Wikidata: Make wbmergeitems respect the bot parameter (duration: 00m 19s) [02:44:00] Logged the message, Master [03:01:23] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL 7.14% of data above the critical threshold [500.0] [03:02:45] !log l10nupdate Synchronized php-1.26wmf7/cache/l10n: (no message) (duration: 10m 14s) [03:02:55] Logged the message, Master [03:09:54] !log LocalisationUpdate completed (1.26wmf7) at 2015-05-22 03:08:51+00:00 [03:10:00] Logged the message, Master [03:11:33] RECOVERY - HTTP 5xx req/min on graphite1001 is OK Less than 1.00% above the threshold [250.0] [04:06:17] robh: around? [04:39:33] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL 7.14% of data above the critical threshold [500.0] [04:52:26] (03PS1) 10Springle: upgrade db1026 trusty mariadb 10 [puppet] - 10https://gerrit.wikimedia.org/r/212747 [04:54:08] (03CR) 10Springle: [C: 032] upgrade db1026 trusty mariadb 10 [puppet] - 10https://gerrit.wikimedia.org/r/212747 (owner: 10Springle) [04:58:24] RECOVERY - HTTP 5xx req/min on graphite1001 is OK Less than 1.00% above the threshold [250.0] [05:05:33] RECOVERY - are wikitech and wt-static in sync on silver is OK: wikitech-static OK - wikitech and wikitech-static in sync (11101 90000s) [05:07:03] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL 6.67% of data above the critical threshold [500.0] [05:17:12] RECOVERY - HTTP 5xx req/min on graphite1001 is OK Less than 1.00% above the threshold [250.0] [05:38:33] 6operations, 6Phabricator, 6Project-Creators, 6Triagers: Broaden the group of users that can create projects in Phabricator - https://phabricator.wikimedia.org/T706#1303306 (10DarTar) As the owner of the Research and Data backlog (recently migrated to Phab from Trello) I'd like to be able to create project... [05:50:39] !log upgrade db1026 trusty mariadb 10, mydumper reload [05:50:48] Logged the message, Master [05:59:28] (03PS1) 1020after4: Unit test to verify the existence of the proper branch symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/212753 [05:59:34] (03CR) 10jenkins-bot: [V: 04-1] Unit test to verify the existence of the proper branch symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/212753 (owner: 1020after4) [05:59:37] 6operations, 5wikis-in-codfw: Document what is left for having a full cluster installation in codfw - https://phabricator.wikimedia.org/T97322#1303325 (10Joe) p:5High>3Normal a:5Joe>3None [06:06:10] (03PS2) 1020after4: Unit test to verify the existence of the proper branch symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/212753 (https://phabricator.wikimedia.org/T99886) [06:06:15] (03CR) 10jenkins-bot: [V: 04-1] Unit test to verify the existence of the proper branch symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/212753 (https://phabricator.wikimedia.org/T99886) (owner: 1020after4) [06:17:57] 7Blocked-on-Operations, 6operations, 10Continuous-Integration-Infrastructure, 6Scrum-of-Scrums, 5Patch-For-Review: Jenkins is using php-luasandbox 1.9-1 for zend unit tests; precise should be upgraded to 2.0-8 or equivalent - https://phabricator.wikimedia.org/T88798#1303337 (10Anomie) The other failures... [06:18:47] (03PS3) 1020after4: Unit test to verify the existence of the proper branch symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/212753 (https://phabricator.wikimedia.org/T99886) [06:23:28] Does scap run phpunit or does it just do some basic linting? [06:25:26] I'll probably add a few more unit tests to mediawiki-config's phpunit suite, and even though jenkins runs the tests I'm thinking it would be best to have scap run them before doing a full sync [06:30:05] PROBLEM - puppet last run on mw1235 is CRITICAL Puppet has 1 failures [06:31:25] PROBLEM - puppet last run on logstash1002 is CRITICAL Puppet has 1 failures [06:32:44] PROBLEM - puppet last run on cp3014 is CRITICAL Puppet has 1 failures [06:33:04] PROBLEM - puppet last run on db2040 is CRITICAL Puppet has 1 failures [06:33:05] PROBLEM - puppet last run on cp3008 is CRITICAL Puppet has 1 failures [06:34:18] twentyafterfour: It doesn't [06:34:45] PROBLEM - puppet last run on mw2050 is CRITICAL Puppet has 1 failures [06:34:48] Also running integration (and that's what most of our tests are) in production is extremely dangerous [06:34:54] PROBLEM - puppet last run on mw1092 is CRITICAL Puppet has 1 failures [06:35:03] If there's a problem that should probably be addressed by enhancing jenkins [06:35:15] PROBLEM - puppet last run on mw2184 is CRITICAL Puppet has 1 failures [06:35:25] PROBLEM - puppet last run on mw2079 is CRITICAL Puppet has 1 failures [06:35:34] PROBLEM - puppet last run on mw1166 is CRITICAL Puppet has 1 failures [06:35:35] hoo: Well the tests I'm writing are to validate the state of what's about to get sync' [06:35:38] sync'd [06:35:51] How is that supposed to work? [06:36:18] currently it's just to check that all the right branch symlinks are set up [06:36:29] mh, ok [06:36:32] Did you see https://phabricator.wikimedia.org/T99998 btw? [06:36:49] I guess that would have saved us from the hazzle with Wikidata earlier [06:36:56] Are you in Lyon, btw? [06:37:03] Or will you be [06:37:04] no but that is a very good idea (and I was going to implement that actually, already had it on my mental todo list) [06:37:11] no I'm in USA [06:37:36] yeah make-wmf-branch definitely needs to check that it's up to date. [06:37:48] or fetch the latest config from git [06:38:07] It should rather check whether it's up to date, that's less error prone [06:38:12] yeah [06:38:21] That was our guess on why the wrong Wikidata version was deployed [06:38:32] I guess you're the only one that can actually verify that [06:38:33] I'll implement that right now [06:38:53] I didn't see the wikidata issue but I'd say that's probably what happened [06:46:15] RECOVERY - puppet last run on cp3014 is OK Puppet is currently enabled, last run 29 seconds ago with 0 failures [06:46:35] RECOVERY - puppet last run on db2040 is OK Puppet is currently enabled, last run 32 seconds ago with 0 failures [06:46:35] RECOVERY - puppet last run on logstash1002 is OK Puppet is currently enabled, last run 19 seconds ago with 0 failures [06:46:44] RECOVERY - puppet last run on cp3008 is OK Puppet is currently enabled, last run 14 seconds ago with 0 failures [06:46:54] RECOVERY - puppet last run on mw1235 is OK Puppet is currently enabled, last run 7 seconds ago with 0 failures [06:47:15] RECOVERY - puppet last run on mw2079 is OK Puppet is currently enabled, last run 11 seconds ago with 0 failures [06:47:24] RECOVERY - puppet last run on mw1166 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:48:15] RECOVERY - puppet last run on mw2050 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:48:25] RECOVERY - puppet last run on mw1092 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:48:28] !log LocalisationUpdate ResourceLoader cache refresh completed at Fri May 22 06:47:25 UTC 2015 (duration 47m 23s) [06:48:35] Logged the message, Master [06:48:45] RECOVERY - puppet last run on mw2184 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [07:35:56] 6operations: Remove ntfs3g/fuse, review needlessly installed packages - https://phabricator.wikimedia.org/T100004#1303405 (10MoritzMuehlenhoff) 3NEW a:3MoritzMuehlenhoff [07:36:12] 6operations: Remove ntfs3g/fuse, review needlessly installed packages - https://phabricator.wikimedia.org/T100004#1303413 (10MoritzMuehlenhoff) p:5Triage>3Normal [07:49:58] (03PS1) 10Jcrespo: Bringing db1036 temporarelly down for maintenance, using db1018 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/212760 [07:55:04] (03CR) 10Jcrespo: [C: 032] Bringing db1036 temporarelly down for maintenance, using db1018 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/212760 (owner: 10Jcrespo) [07:58:41] !log jynus Synchronized wmf-config/db-eqiad.php: depool db1036 (duration: 00m 13s) [07:58:48] Logged the message, Master [08:06:45] (03PS2) 10Mobrovac: Fix labs graphoid IP [puppet] - 10https://gerrit.wikimedia.org/r/212743 (https://phabricator.wikimedia.org/T99885) (owner: 10GWicke) [08:06:56] (03CR) 10Mobrovac: [C: 031] Fix labs graphoid IP [puppet] - 10https://gerrit.wikimedia.org/r/212743 (https://phabricator.wikimedia.org/T99885) (owner: 10GWicke) [08:07:28] godog, akosiaris: around? [08:07:42] mobrovac: yup [08:08:02] mind review/merge https://gerrit.wikimedia.org/r/#/c/212743/ ? [08:08:11] just a simple config change for restbase in beta [08:13:30] (03CR) 10Yuvipanda: [C: 04-1] "I think for now limiting this to prod would be the right thing to do." [puppet] - 10https://gerrit.wikimedia.org/r/199936 (owner: 10Chad) [08:13:51] (03CR) 10Alexandros Kosiaris: [C: 032] Fix labs graphoid IP [puppet] - 10https://gerrit.wikimedia.org/r/212743 (https://phabricator.wikimedia.org/T99885) (owner: 10GWicke) [08:14:46] akosiaris: cheers! [08:19:11] mobrovac: I forced a sync on the deployment puppetmaster as well [08:19:35] akosiaris: hehe, thnx was about to do that because the agents still had the older ver [08:19:45] redoing puppet-agent on hosts now [08:20:24] akosiaris: coming to lyon today? [08:21:04] hmmm, still not fetching the new ver [08:22:51] mobrovac: nope [08:23:00] but _joe_ is [08:23:15] ah [08:42:05] PROBLEM - nutcracker port on silver is CRITICAL - Socket timeout after 2 seconds [08:43:44] RECOVERY - nutcracker port on silver is OK: TCP OK - 0.000 second response time on port 11212 [09:25:08] 6operations, 6Phabricator, 6Project-Creators, 6Triagers: Broaden the group of users that can create projects in Phabricator - https://phabricator.wikimedia.org/T706#1303444 (10Qgil) Done! (and this was my first productive action in Lyon!) :) [10:16:37] (03PS2) 10Alexandros Kosiaris: beta: Update the puppetmaster more often [puppet] - 10https://gerrit.wikimedia.org/r/210059 [10:30:59] (03PS3) 10Alexandros Kosiaris: beta: Update the puppetmaster more often [puppet] - 10https://gerrit.wikimedia.org/r/210059 [10:31:01] 7Blocked-on-Operations, 6Collaboration-Team, 10Echo, 6Scrum-of-Scrums, 7Schema-change: Perform schema change to echo_target_page changing from a 1 to 1 mapping between pages and user/notification to a 1 to many. - https://phabricator.wikimedia.org/T94427#1303502 (10Springle) p:5Normal>3High [10:31:29] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] beta: Update the puppetmaster more often [puppet] - 10https://gerrit.wikimedia.org/r/210059 (owner: 10Alexandros Kosiaris) [10:34:12] 7Blocked-on-Operations, 6Collaboration-Team, 10Echo, 6Scrum-of-Scrums, 7Schema-change: Perform schema change to echo_target_page changing from a 1 to 1 mapping between pages and user/notification to a 1 to many. - https://phabricator.wikimedia.org/T94427#1303505 (10Springle) a:5Springle>3jcrespo [10:35:42] matt_flaschen: jynus will sort that out for you [10:35:54] springle, thanks. [10:36:00] whoa, different channel [10:36:53] springle, do you prefer using the https://phabricator.wikimedia.org/T51188 tracking bug, https://phabricator.wikimedia.org/tag/schema-change/, both, or it doesn't matter? [10:37:55] matt_flaschen: we're supposed to be switching to a tag/board, but the tracking bug is still canonical [10:38:32] Okay, then we probably should've added this one earlier than two days ago... [10:38:37] It was only on the tag. [10:38:54] that ambiguity is my fault [10:38:57] (03PS1) 10Jcrespo: Repooling db1036 and setting it back as the logpager [mediawiki-config] - 10https://gerrit.wikimedia.org/r/212772 [10:42:28] (03CR) 10Jcrespo: [C: 04-1] Repooling db1036 and setting it back as the logpager [mediawiki-config] - 10https://gerrit.wikimedia.org/r/212772 (owner: 10Jcrespo) [10:45:06] (03PS2) 10Jcrespo: Repooling db1036 and setting it back as the logpager [mediawiki-config] - 10https://gerrit.wikimedia.org/r/212772 [10:51:18] (03PS3) 10Jcrespo: Repooling db1036 and setting it back as the logpager [mediawiki-config] - 10https://gerrit.wikimedia.org/r/212772 [10:53:17] (03CR) 10Jcrespo: [C: 032] Repooling db1036 and setting it back as the logpager [mediawiki-config] - 10https://gerrit.wikimedia.org/r/212772 (owner: 10Jcrespo) [10:55:16] !log jynus Synchronized wmf-config/db-eqiad.php: repool db1036 (duration: 00m 12s) [10:55:27] Logged the message, Master [11:04:48] (03PS1) 10Alexandros Kosiaris: Remove inheritance from puppet::self::config [puppet] - 10https://gerrit.wikimedia.org/r/212773 [11:05:36] PROBLEM - puppet last run on mw2028 is CRITICAL puppet fail [11:05:38] (03CR) 10jenkins-bot: [V: 04-1] Remove inheritance from puppet::self::config [puppet] - 10https://gerrit.wikimedia.org/r/212773 (owner: 10Alexandros Kosiaris) [11:07:09] (03PS2) 10Alexandros Kosiaris: Remove inheritance from puppet::self::config [puppet] - 10https://gerrit.wikimedia.org/r/212773 [11:15:13] (03CR) 10Alexandros Kosiaris: [C: 032] Remove inheritance from puppet::self::config [puppet] - 10https://gerrit.wikimedia.org/r/212773 (owner: 10Alexandros Kosiaris) [11:24:25] RECOVERY - puppet last run on mw2028 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:26:14] https://commons.wikimedia.org/wiki/File:Georg_Miretsky_TOM-1_%27Kantata_%3DWolyn-43%3D%27.pdf [11:26:18] [44b7e140] 2015-05-22 11:26:09: Fatal exception of type MWException [11:27:37] 6operations: [7a44ef6d] 2015-05-22 11:26:53: Fatal exception of type MWException - https://phabricator.wikimedia.org/T100012#1303531 (10Steinsplitter) 3NEW [11:28:12] 6operations: [7a44ef6d] 2015-05-22 11:26:53: Fatal exception of type MWException - https://phabricator.wikimedia.org/T100012#1303531 (10Steinsplitter) [11:31:05] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There are 2 unmerged changes in puppet (dir /var/lib/git/operations/puppet). [11:42:24] hi [11:42:32] https://upload.wikimedia.org/wikipedia/commons/thumb/8/84/Logiz_de_la_Lune_Rousse-Affiche1900.jpg/180px-Logiz_de_la_Lune_Rousse-Affiche1900.jpg [11:42:40] any idea why this doesn't work? [11:42:48] the image is huge, but the thumb is small [11:42:58] is this a known bug? [11:43:18] should I create a new report? [11:44:49] https://commons.wikimedia.org/wiki/File:Logiz_de_la_Lune_Rousse-Affiche1900.jpg [11:44:57] 14,887 × 10,300 pixels, file size: 47.31 MB [11:45:05] FYI it works here: https://upload.wikimedia.org/wikipedia/commons/thumb/8/84/Logiz_de_la_Lune_Rousse-Affiche1900.jpg/120px-Logiz_de_la_Lune_Rousse-Affiche1900.jpg [11:46:26] and now: There have been too many recent failed attempts (4 or more) to render this thumbnail. Please try again later. [12:43:21] (03PS1) 10Alexandros Kosiaris: puppetmaster::logstash remove inclusion of base::puppet [puppet] - 10https://gerrit.wikimedia.org/r/212777 [12:45:49] (03CR) 10Alexandros Kosiaris: [C: 032] puppetmaster::logstash remove inclusion of base::puppet [puppet] - 10https://gerrit.wikimedia.org/r/212777 (owner: 10Alexandros Kosiaris) [12:47:15] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [12:49:48] twentyafterfour: Which is the channel to ask question about Phabricator? [12:50:13] or anyone else know :) [12:51:58] kart_: #wikimedia-devtools [13:30:50] (03PS2) 10Jforrester: Disable VisualEditor A/B test pilot [mediawiki-config] - 10https://gerrit.wikimedia.org/r/211696 [13:49:35] PROBLEM - puppet last run on mw1150 is CRITICAL Puppet has 1 failures [13:53:24] (03CR) 10Ottomata: [C: 032] zookeeper-cleanup: don't generate cron email for normal operation [puppet/zookeeper] - 10https://gerrit.wikimedia.org/r/212746 (owner: 10Gage) [13:55:07] (03CR) 10Ottomata: "If anyone knows what PEP8 crime Jenkins thinks I am committing, let me know." [puppet] - 10https://gerrit.wikimedia.org/r/212041 (https://phabricator.wikimedia.org/T83580) (owner: 10Ottomata) [13:57:55] !log schema change on x1 shard https://phabricator.wikimedia.org/T94427 No downtime expected [13:58:01] Logged the message, Master [14:03:25] jgage: you can run pep8 on your own from the commandline [14:03:39] I donno what's wrong with the CI output there, but the answer to your question is: [14:03:43] bblack-mba:puppet bblack$ pep8 modules/varnish/files/varnishstats-diamond-collector.py [14:03:45] modules/varnish/files/varnishstats-diamond-collector.py:55:5: E129 visually indented line with same indent as next logical line [14:03:49] modules/varnish/files/varnishstats-diamond-collector.py:248:1: E265 block comment should start with '# ' [14:04:17] err sorry, ottomata ^ [14:04:55] RECOVERY - puppet last run on mw1150 is OK Puppet is currently enabled, last run 28 seconds ago with 0 failures [14:09:56] ha, thanks bblack, i started to run pep8 on my own, but then stopped because of some misconfigured or misinstalled brew nastiness on my os x that I didn't feel like hunting down at the moment [14:10:29] (03PS10) 10Giuseppe Lavagetto: confd: create module [puppet] - 10https://gerrit.wikimedia.org/r/208399 (https://phabricator.wikimedia.org/T97974) [14:12:22] <_joe_> ottomata: also, use autoflake8 [14:13:01] (03PS1) 10Yuvipanda: wmflib: Add nameserver parameter to ipresolve function [puppet] - 10https://gerrit.wikimedia.org/r/212784 [14:13:04] _joe_: ^ [14:13:27] no idea if it works yet [14:13:30] (03PS2) 10Yuvipanda: wmflib: Add nameserver parameter to ipresolve function [puppet] - 10https://gerrit.wikimedia.org/r/212784 (https://phabricator.wikimedia.org/T99833) [14:13:40] (03PS16) 10Ottomata: Add varnish request stats diamond collector [puppet] - 10https://gerrit.wikimedia.org/r/212041 (https://phabricator.wikimedia.org/T83580) [14:14:42] (03PS2) 10Giuseppe Lavagetto: wmflib: Move ipresolve function into wmflib [puppet] - 10https://gerrit.wikimedia.org/r/212464 (https://phabricator.wikimedia.org/T99833) (owner: 10Yuvipanda) [14:14:59] <_joe_> yuvipanda: should I merge ^^ [14:15:12] _joe_: has a dependent patch, let me move it around [14:15:33] <_joe_> yeah this one is pretty much a non-issue I mean [14:15:36] (03CR) 10Alexandros Kosiaris: [C: 031] "1 inline question, rest LGTM" (031 comment) [debs/python-etcd] - 10https://gerrit.wikimedia.org/r/212528 (https://phabricator.wikimedia.org/T99771) (owner: 10Filippo Giunchedi) [14:15:44] _joe_: yup [14:16:13] mmm, git so slow [14:16:16] on airport wifi [14:16:54] let me just merge that and babysite [14:16:58] *sit [14:16:59] (03PS5) 10Yuvipanda: labs: Set labs nameserver IP globally in $::nameservers [puppet] - 10https://gerrit.wikimedia.org/r/212444 [14:17:36] _joe_: can you +1 ^? [14:18:05] <_joe_> yuvipanda: which one? [14:18:14] _joe_: $::nameservers [14:18:17] the one Ij ust rebased [14:18:18] *just [14:18:21] https://gerrit.wikimedia.org/r/#/c/212444/5 [14:18:55] that's what the wmflib move patch depends on and I figured easier to just merge and get it over with than drop that patch from dependencies [14:19:02] (because git is slow her) [14:21:59] 7Blocked-on-Operations, 6Collaboration-Team, 10Echo, 6Scrum-of-Scrums, 7Schema-change: Perform schema change to echo_target_page changing from a 1 to 1 mapping between pages and user/notification to a 1 to many. - https://phabricator.wikimedia.org/T94427#1303666 (10jcrespo) Converting right now the schem... [14:26:45] _joe_: am testing it on deployment-salt now (the $::nameservers patch) [14:27:52] ohmahgoodness i passed pep8 [14:28:17] godog: you can review again, i know you've been on the edge of your seat waiting [14:28:17] https://gerrit.wikimedia.org/r/#/c/212041/15 [14:29:01] (03PS3) 10Yuvipanda: wmflib: Move ipresolve function into wmflib [puppet] - 10https://gerrit.wikimedia.org/r/212464 (https://phabricator.wikimedia.org/T99833) [14:29:03] (03PS6) 10Yuvipanda: labs: Set labs nameserver IP globally in $::nameservers [puppet] - 10https://gerrit.wikimedia.org/r/212444 [14:29:05] (03PS3) 10Yuvipanda: wmflib: Add nameserver parameter to ipresolve function [puppet] - 10https://gerrit.wikimedia.org/r/212784 (https://phabricator.wikimedia.org/T99833) [14:30:26] ottomata: haha sure I'll take a look, I am actually on a seat waiting [14:32:35] did I miss anything? [14:33:27] OK, the nameservers change is a noop, let me merge [14:33:41] (03PS7) 10Yuvipanda: labs: Set labs nameserver IP globally in $::nameservers [puppet] - 10https://gerrit.wikimedia.org/r/212444 [14:33:59] (03CR) 10Yuvipanda: [C: 032 V: 032] labs: Set labs nameserver IP globally in $::nameservers [puppet] - 10https://gerrit.wikimedia.org/r/212444 (owner: 10Yuvipanda) [14:37:20] (03PS4) 10Yuvipanda: wmflib: Move ipresolve function into wmflib [puppet] - 10https://gerrit.wikimedia.org/r/212464 (https://phabricator.wikimedia.org/T99833) [14:37:38] (03CR) 10Yuvipanda: [C: 032 V: 032] wmflib: Move ipresolve function into wmflib [puppet] - 10https://gerrit.wikimedia.org/r/212464 (https://phabricator.wikimedia.org/T99833) (owner: 10Yuvipanda) [14:37:49] _joe_: I merged both of them, https://gerrit.wikimedia.org/r/#/c/212784/ left now [14:38:22] jgage: ^ vaguely touches the strongswan module, should be totally fine but let me know if it isn't [14:46:41] 6operations, 5Interdatacenter-IPsec: Strongswan: security association reauthentication failure - https://phabricator.wikimedia.org/T96111#1303691 (10BBlack) I see 5.3.0 on curium, does that mean that part is done now? [14:48:04] (03PS1) 10Alexandros Kosiaris: Assign role::postgres::maps to labsdb1004 [puppet] - 10https://gerrit.wikimedia.org/r/212786 [14:49:49] (03CR) 10Alexandros Kosiaris: [C: 032] Assign role::postgres::maps to labsdb1004 [puppet] - 10https://gerrit.wikimedia.org/r/212786 (owner: 10Alexandros Kosiaris) [14:52:57] yuvipanda: thanks, i'll check now [14:53:04] should be a total noop tho [14:53:30] _joe_: so I think https://gerrit.wikimedia.org/r/#/c/212784/ works (did some local tests) but not fully sure if my cache modifications are ok [14:53:48] heh yep that looks pretty noopy [14:54:29] cool [14:55:25] <_joe_> I'm looking at it right now yuvipanda [14:55:34] _joe_: sweet thanks [14:55:38] <_joe_> but evil bd808 lured me into a conversation [14:55:46] (03PS1) 10BBlack: test retry503 on upload-fe for load effects and reboot resiliency [puppet] - 10https://gerrit.wikimedia.org/r/212788 [14:56:07] * bd808 will stop being social in meat space [14:56:09] _joe_: in person? WHAT HAPPENED TO BEING REMOTE FRIENDLY, BD808? [14:56:10] :P [14:56:23] (03CR) 10BBlack: [C: 032] test retry503 on upload-fe for load effects and reboot resiliency [puppet] - 10https://gerrit.wikimedia.org/r/212788 (owner: 10BBlack) [14:56:31] there are people here without their laptops open [14:56:32] (03CR) 10BBlack: [V: 032] test retry503 on upload-fe for load effects and reboot resiliency [puppet] - 10https://gerrit.wikimedia.org/r/212788 (owner: 10BBlack) [14:56:46] jouncebot, next [14:56:46] In 0 hour(s) and 3 minute(s): VisualEditor A/B test config change (Special arrangement) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150522T1500) [14:57:02] Whee. [14:57:05] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL 7.14% of data above the critical threshold [500.0] [14:57:20] Krenair: Turns out I am here after all. Lufthansa are just /that/ amazing. [14:57:39] James_F: they just love you and want you to be present for the config change [14:57:47] yuvipanda: Clearly. :-) [14:59:20] 7Blocked-on-Operations, 6Collaboration-Team, 10Echo, 6Scrum-of-Scrums, 7Schema-change: Perform schema change to echo_target_page changing from a 1 to 1 mapping between pages and user/notification to a 1 to many. - https://phabricator.wikimedia.org/T94427#1303704 (10jcrespo) @Mattflaschen May 22 14:49:48... [15:00:05] Krenair, James_F: Dear anthropoid, the time has come. Please deploy VisualEditor A/B test config change (Special arrangement) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150522T1500). [15:00:07] (03CR) 10Alex Monk: [C: 032] Disable VisualEditor A/B test pilot [mediawiki-config] - 10https://gerrit.wikimedia.org/r/211696 (owner: 10Jforrester) [15:00:13] (03Merged) 10jenkins-bot: Disable VisualEditor A/B test pilot [mediawiki-config] - 10https://gerrit.wikimedia.org/r/211696 (owner: 10Jforrester) [15:01:23] !log krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/211696/ - disable VE A/B test (duration: 00m 12s) [15:01:36] Logged the message, Master [15:02:06] James_F, ^ [15:02:06] (03CR) 10Filippo Giunchedi: "some comments but the rest LGTM" (035 comments) [puppet] - 10https://gerrit.wikimedia.org/r/212041 (https://phabricator.wikimedia.org/T83580) (owner: 10Ottomata) [15:02:45] Krenair: Excellent. Not sure there's a sane way to test – absence of evidence isn't evidence of absence – but let's ping people that it's happened. [15:04:07] James_F, updated email thread, posted on task... [15:04:16] Thanks. :-) [15:04:16] stupid airport wifi [15:04:18] Does that resolve https://phabricator.wikimedia.org/T90666 ? [15:04:23] requires me to click a button every 30m [15:06:50] godog: re metric name path prefix, you mean it would be better to just use 'path' on its own and not the other configs? [15:07:10] i need to know the varnishname still, but i suppose i could decouple that from the path [15:07:25] RECOVERY - HTTP 5xx req/min on graphite1001 is OK Less than 1.00% above the threshold [250.0] [15:11:25] PROBLEM - puppet last run on mw2073 is CRITICAL Puppet has 1 failures [15:12:53] Krenair: No, T90666 covers the real A/B test mostly. [15:13:19] okay [15:13:34] andrewbogott: oh, so ldap doesn't have public IP info [15:13:53] hmm so can't get translator info from there [15:13:57] need to get it from nova, maybe [15:15:06] (03CR) 10Yuvipanda: "LDAP apparently doesn't have public IP info, so we'll have to hit nova directly to autogenerate them (should come in a later patch, I gues" [puppet] - 10https://gerrit.wikimedia.org/r/211905 (owner: 10Andrew Bogott) [15:18:18] ottomata: yeah the more predictable we can make it the better, but I see what you mean all of that must be from the config anyway [15:19:09] ottomata: I'll clarify my comment, anyways that's also related to the comment re: diamond::collector and what to do with frontend/backend [15:21:52] (03CR) 10Filippo Giunchedi: Add varnish request stats diamond collector (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/212041 (https://phabricator.wikimedia.org/T83580) (owner: 10Ottomata) [15:24:18] yeah [15:24:39] godog: not sure what you mean by the 'shared_dict could change size' comment either [15:24:42] why is that liable to fail [15:24:46] yes it will change size, but that's ok, no? [15:24:55] collect() is just run periodically [15:24:58] right? [15:25:06] won't it just iterate through the items each time [15:25:06] ? [15:27:56] ottomata: heh good question, I _think_ the dict shouldn't change size while something is iterating over it [15:29:24] oh WHILE it is iterating [15:29:26] hm. [15:29:31] yeah dunno what happens there [15:29:37] gotta go [15:29:44] it could be weird, but it doesn't seem like it would really be a problem [15:29:51] maybe it would miss a key during a run? [15:30:05] RECOVERY - puppet last run on mw2073 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [15:30:09] ok, laters, thanks [15:31:18] ottomata: godog dicts can't change when they are being iterated over [15:31:23] you will get a > RuntimeError: dictionary changed size during iteration [15:32:31] yuvipanda: this is a Manger DictProxy [15:32:58] ah nvm then [15:34:31] <_joe_> akosiaris: https://gerrit.wikimedia.org/r/#/c/212789/ [15:34:48] <_joe_> are you able to rebuild hhvm-luasandbox with this? [15:34:54] <_joe_> if now, I'll do it [15:40:04] PROBLEM - puppet last run on mw2117 is CRITICAL Puppet has 1 failures [15:48:03] 6operations, 10Architecture, 10MediaWiki-RfCs, 10RESTBase, and 5 others: RFC: Re-evaluate varnish-level request-restart behavior on 5xx - https://phabricator.wikimedia.org/T97206#1303730 (10BBlack) Not directly as part of sorting out this whole affair, but more as a pragmatic matter, I've turned on retry50... [15:50:29] 6operations, 10Traffic: Reboot caches for kernel 3.19.6 globally - https://phabricator.wikimedia.org/T96854#1303733 (10BBlack) https://gerrit.wikimedia.org/r/#/c/212788/ added retry503 behavior for a single request-restart in the upload-frontend case only on 503 result. This killed the spikes on upload cache... [15:55:14] RECOVERY - puppet last run on mw2117 is OK Puppet is currently enabled, last run 14 seconds ago with 0 failures [16:01:31] (03PS1) 10Yuvipanda: shinken: Remove myself from some contactgroups [puppet] - 10https://gerrit.wikimedia.org/r/212791 [16:01:46] (03PS2) 10Yuvipanda: shinken: Remove myself from some contactgroups [puppet] - 10https://gerrit.wikimedia.org/r/212791 [16:01:58] (03CR) 10Yuvipanda: [C: 032 V: 032] shinken: Remove myself from some contactgroups [puppet] - 10https://gerrit.wikimedia.org/r/212791 (owner: 10Yuvipanda) [16:16:03] (03PS4) 10Yuvipanda: wmflib: Add nameserver parameter to ipresolve function [puppet] - 10https://gerrit.wikimedia.org/r/212784 (https://phabricator.wikimedia.org/T99833) [16:16:05] (03PS1) 10Yuvipanda: tools: Make redis failover-able [puppet] - 10https://gerrit.wikimedia.org/r/212792 (https://phabricator.wikimedia.org/T99737) [16:16:08] _joe_: ^ my endgame for the ipresolve changes [16:16:14] (the tools redis patch) [16:16:40] (03PS1) 10Gage: merge zookeeper submodule update I1fa0312aedf57c05dfd326b253b9f732abd4c20b [puppet] - 10https://gerrit.wikimedia.org/r/212793 [16:16:53] (03PS2) 10Gage: merge zookeeper submodule update I1fa0312aedf57c05dfd326b253b9f732abd4c20b [puppet] - 10https://gerrit.wikimedia.org/r/212793 [16:17:48] (03CR) 10Gage: [C: 032] merge zookeeper submodule update I1fa0312aedf57c05dfd326b253b9f732abd4c20b [puppet] - 10https://gerrit.wikimedia.org/r/212793 (owner: 10Gage) [16:26:05] PROBLEM - High load average on labstore1001 is CRITICAL 62.50% of data above the critical threshold [24.0] [16:30:11] (03PS1) 10Yuvipanda: ssh: Make hba enable-able via hiera [puppet] - 10https://gerrit.wikimedia.org/r/212794 [16:31:15] RECOVERY - High load average on labstore1001 is OK Less than 50.00% above the threshold [16.0] [16:31:25] (03PS2) 10Yuvipanda: ssh: Make hba enable-able via hiera [puppet] - 10https://gerrit.wikimedia.org/r/212794 (https://phabricator.wikimedia.org/T98714) [16:32:05] (03PS3) 10Yuvipanda: ssh: Make hba enable-able via hiera [puppet] - 10https://gerrit.wikimedia.org/r/209993 (https://phabricator.wikimedia.org/T98714) [16:32:23] (03Abandoned) 10Yuvipanda: ssh: Make hba enable-able via hiera [puppet] - 10https://gerrit.wikimedia.org/r/212794 (https://phabricator.wikimedia.org/T98714) (owner: 10Yuvipanda) [16:38:11] 6operations, 10Traffic, 7HTTPS: implement Public Key Pinning (HPKP) for Wikimedia domains - https://phabricator.wikimedia.org/T92002#1303806 (10BBlack) I think the ideal situation we should aim for in the long term is that we'd sign 3x leaf keys roughly as discussed in various recommendations: 1. The one cur... [16:51:24] 6operations, 6Labs, 3ToolLabs-Goals-Q4: Investigate kernel issues on labvirt** hosts - https://phabricator.wikimedia.org/T99738#1303823 (10yuvipanda) I feel like an idiot - I upgraded the kernel on labvirt100*6* - thankfully caught myself before doing a reboot. I'll let it be, and just upgrade 1005 as well. [16:53:33] !log rebooted labvirt1005 for T99738 [16:53:43] Logged the message, Master [16:55:55] PROBLEM - Host labvirt1005 is DOWN: PING CRITICAL - Packet loss = 100% [16:57:46] ACKNOWLEDGEMENT - Host labvirt1005 is DOWN: PING CRITICAL - Packet loss = 100% Yuvi Panda T99738 [17:05:18] hmm [17:08:26] ^I know a couple of people that would be looking bad to you :-) [17:09:09] BTW, yuvipanda, are you an openstack master? [17:09:15] not even close :) [17:09:29] you're looking for andrewbogott [17:09:38] ok, then [17:09:54] I will have to wait until nex week [17:10:00] hmm, labvirt1005 looks like it isn't coming up [17:10:04] and I can't get in on mgmt [17:10:30] ok, it happened to me with other host, it has to be restarted on DC [17:10:31] ah i can [17:10:37] I forgot the root@ [17:10:49] Ops, Phabricator doesn't like me :( [17:10:50] uh [17:10:52] prompt is (initramfs) [17:11:37] (Mainly when I search 'mediawiki 1.25' - I get 503d) [17:11:42] ALERT! /dev/disk/by-uuid/861a4750-9243-4da7-b566-8c3cebfd6114 does not exist. Dropping to a shell! [17:11:43] sigh [17:12:53] 6operations, 6Labs, 3ToolLabs-Goals-Q4: Investigate kernel issues on labvirt** hosts - https://phabricator.wikimedia.org/T99738#1303843 (10yuvipanda) Whoops - 1005 isn't coming back up. In shell with: ```ALERT! /dev/disk/by-uuid/861a4750-9243-4da7-b566-8c3cebfd6114 does not exist. Dropping to a shell!```... [17:13:19] Lcawte, people has been changing something related to searches, but I do not know what exactly [17:13:24] it may be related [17:16:14] I have to restart, and proably will take a break, later! [17:16:37] !log rebooted labvirt1005 from mgmt see what's up with disk array [17:16:46] Logged the message, Master [17:27:44] 6operations, 6Labs: labvirt1005 doesn't boot up - https://phabricator.wikimedia.org/T100030#1303871 (10yuvipanda) 3NEW [17:44:48] 6operations, 6Phabricator, 7database: Phabricator database access for Joel Aufrecht - https://phabricator.wikimedia.org/T99295#1303898 (10JAufrecht) I have a script working off of a sample Phab database, and a window of free time today through next Tue/Wed, so the timing is perfect if there's any way to move... [17:48:37] 10Ops-Access-Requests, 6operations, 6Project-Creators: Create a Phabricator project for 'Partnerships' - https://phabricator.wikimedia.org/T99945#1303907 (10RobH) @SVentura, The #ops-access-requests is not the proper project for this. The process for requesting new projects is outlined on http://www.mediaw... [17:52:33] 10Ops-Access-Requests, 6operations, 6Release-Engineering, 5Patch-For-Review: Grant access for aklapper to phab-admins - https://phabricator.wikimedia.org/T97642#1303913 (10RobH) 5Open>3stalled @Aklapper, Since you are likely busy with hackathon and the like, I've placed this task to stalled. No one i... [17:53:34] 6operations: install/deploy mira as codfw deployment server - https://phabricator.wikimedia.org/T95436#1303915 (10RobH) [18:07:45] PROBLEM - High load average on labstore1001 is CRITICAL 50.00% of data above the critical threshold [24.0] [18:17:45] RECOVERY - High load average on labstore1001 is OK Less than 50.00% above the threshold [16.0] [18:43:12] 10Ops-Access-Requests, 6operations: Additional Webmaster tools access - https://phabricator.wikimedia.org/T98283#1303955 (10Dzahn) >>! In T98283#1284109, @Wwes wrote: > While we wait for a simple non tedious manual solve, how about we start with these for wikipedias: > en, de, zh, ru, it, es, fr, ja, pt, tr, n... [18:55:37] 10Ops-Access-Requests, 6operations: Additional Webmaster tools access - https://phabricator.wikimedia.org/T98283#1303990 (10ArielGlenn) I added them to http and not https, as I mentioned in an earlier comment. But the ticket is not 'done". [19:04:30] 6operations: Google Webmaster Tools - 1000 domain limit - https://phabricator.wikimedia.org/T99132#1304001 (10Dzahn) [19:04:32] 10Ops-Access-Requests, 6operations: Additional Webmaster tools access - https://phabricator.wikimedia.org/T98283#1304000 (10Dzahn) [19:11:50] 6operations, 10Deployment-Systems, 5Patch-For-Review: need package 'trebuchet-trigger' for trusty - https://phabricator.wikimedia.org/T99919#1304005 (10Dzahn) a:3Dzahn [19:18:07] !log ori Synchronized php-1.26wmf7/includes/profiler: a69ee4a0f7, a3773b4d8b, ab19be9d99: Profiler improvements (duration: 00m 15s) [19:18:13] Logged the message, Master [19:19:05] !log ori Synchronized php-1.26wmf6/includes/profiler: 0d9c4dd8fe, ec22d6e6c3, 4127b1a315: Profiler improvements (duration: 00m 16s) [19:19:10] Logged the message, Master [19:22:22] (03PS4) 10Dzahn: bump version to 0.5.6-1, build for trusty [software/deployment/trebuchet-trigger] - 10https://gerrit.wikimedia.org/r/212707 (https://phabricator.wikimedia.org/T99919) [19:26:26] (03PS1) 10Ori.livneh: Re-enable xhprof profiling [mediawiki-config] - 10https://gerrit.wikimedia.org/r/212808 (https://phabricator.wikimedia.org/T66301) [19:26:57] (03CR) 10Ori.livneh: [C: 04-1] "Pending review / coordination with @godog." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/212808 (https://phabricator.wikimedia.org/T66301) (owner: 10Ori.livneh) [19:36:23] (03CR) 10Ori.livneh: [C: 031] "LGTM. Thanks for taking care of this." [software/deployment/trebuchet-trigger] - 10https://gerrit.wikimedia.org/r/212707 (https://phabricator.wikimedia.org/T99919) (owner: 10Dzahn) [19:38:22] (03CR) 10Dzahn: "it doesn't like this version string with the ~ in it." [software/deployment/trebuchet-trigger] - 10https://gerrit.wikimedia.org/r/212707 (https://phabricator.wikimedia.org/T99919) (owner: 10Dzahn) [19:39:51] 6operations, 10ops-eqiad: analytics1036 can't talk cross row? - https://phabricator.wikimedia.org/T99845#1304032 (10Ottomata) Any insights? [19:45:10] 6operations, 10ops-eqiad: analytics1036 can't talk cross row? - https://phabricator.wikimedia.org/T99845#1304047 (10Gage) I recommend checking all bios settings against 1035 for accidental changes (I can't imagine what setting would cause this, but we might as well rule it out), which will also result in reboo... [19:48:10] ^ also, racadm serveraction powercycle to (hopefully) reboot the nic [19:48:29] i have seen the kernel put nics in weird states that persisted across warm reboots before [19:50:42] (i'll be good and put that in the task..) [19:51:41] (03PS5) 10Dzahn: bump version to 0.5.6-1, build for trusty [software/deployment/trebuchet-trigger] - 10https://gerrit.wikimedia.org/r/212707 (https://phabricator.wikimedia.org/T99919) [19:51:43] 6operations, 10ops-eqiad: analytics1036 can't talk cross row? - https://phabricator.wikimedia.org/T99845#1304048 (10Gage) Also let's try to ensure the NIC gets rebooted. My expectation is that racadm serveraction powercycle will do it. I've seen NICs get into weird states that persisted across warm reboots bef... [19:52:15] 6operations, 10ops-eqiad: analytics1036 can't talk cross row? - https://phabricator.wikimedia.org/T99845#1304049 (10Ottomata) Ok, thanks. I won’t be able to get to that today. Will do on Tuesday. [19:52:39] (03PS6) 10Dzahn: bump version to 0.5.6-1, build for trusty [software/deployment/trebuchet-trigger] - 10https://gerrit.wikimedia.org/r/212707 (https://phabricator.wikimedia.org/T99919) [19:57:38] (03PS7) 10Dzahn: bump version to 0.5.6-1, build for trusty [software/deployment/trebuchet-trigger] - 10https://gerrit.wikimedia.org/r/212707 (https://phabricator.wikimedia.org/T99919) [20:03:07] (03PS8) 10Dzahn: bump version to 0.5.6-1, build for trusty [software/deployment/trebuchet-trigger] - 10https://gerrit.wikimedia.org/r/212707 (https://phabricator.wikimedia.org/T99919) [20:20:45] (03CR) 10Ottomata: Add varnish request stats diamond collector (035 comments) [puppet] - 10https://gerrit.wikimedia.org/r/212041 (https://phabricator.wikimedia.org/T83580) (owner: 10Ottomata) [20:21:18] (03PS17) 10Ottomata: Add varnish request stats diamond collector [puppet] - 10https://gerrit.wikimedia.org/r/212041 (https://phabricator.wikimedia.org/T83580) [20:24:45] PROBLEM - DPKG on mira is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:35:44] PROBLEM - High load average on labstore1001 is CRITICAL 50.00% of data above the critical threshold [24.0] [20:46:38] (03CR) 10Ori.livneh: "It'll be difficult to prevent this from deadlocking, IMO, given the rate of output of varnishncsa. It'd be better (and not much harder) to" (0313 comments) [puppet] - 10https://gerrit.wikimedia.org/r/212041 (https://phabricator.wikimedia.org/T83580) (owner: 10Ottomata) [20:48:00] (03PS18) 10Ottomata: Add varnish request stats diamond collector [puppet] - 10https://gerrit.wikimedia.org/r/212041 (https://phabricator.wikimedia.org/T83580) [20:54:24] RECOVERY - High load average on labstore1001 is OK Less than 50.00% above the threshold [16.0] [20:57:35] PROBLEM - Router interfaces on cr1-eqiad is CRITICAL host 208.80.154.196, interfaces up: 228, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-4/2/0: down - Core: cr1-codfw:xe-5/2/1 (Telia, IC-307235) (#2648) [10Gbps wave]BR [20:59:09] !log ori Synchronized php-1.26wmf7/includes: 4632aff034 (duration: 00m 18s) [20:59:18] Logged the message, Master [20:59:52] 6operations, 6Labs, 3ToolLabs-Goals-Q4: Investigate kernel issues on labvirt** hosts - https://phabricator.wikimedia.org/T99738#1304100 (10Andrew) For the record: I've seen this once before. During an earlier labvirt crash I tried updating to a non 3.13 kernel (.15 I think?) and that box couldn't see its f... [21:07:50] is git.wikimedia.org feeling okay? http://git.wikimedia.org/summary/mediawiki%2Fextensions%2FWikibaseRepository.git is showing 0 commits, which I assume is wrong [21:12:13] (03CR) 10Ottomata: "Really? The rate on individual varnish nodes isn't that high, is it? Peak 6000 / sec on bits nodes, if I remember correctly." [puppet] - 10https://gerrit.wikimedia.org/r/212041 (https://phabricator.wikimedia.org/T83580) (owner: 10Ottomata) [21:13:05] MC8: is this it instead? https://git.wikimedia.org/tree/mediawiki%2Fextensions%2FWikibase [21:19:17] probably, I'm rather lost :) [21:20:49] ori: thanks for comments, i'm going through them now [21:20:53] lots of really good ones. [21:21:19] q, do you think the httplib.responses thing is worth it? it is slicker, but it'll probably be slightly less performnant? [21:23:21] MC8: try this: go to gerrit.wikimedia.org and in the search bar type project:mediawiki/extensions/Wikibase and wait for the autocomplete [21:25:12] MC8: git clone https://gerrit.wikimedia.org/r/p/mediawiki/extensions/Wikibase (seems like the "Repository" one is like a misnomer ?) [21:40:43] (03PS19) 10Ottomata: Add varnish request stats diamond collector [puppet] - 10https://gerrit.wikimedia.org/r/212041 (https://phabricator.wikimedia.org/T83580) [21:41:25] (03CR) 10jenkins-bot: [V: 04-1] Add varnish request stats diamond collector [puppet] - 10https://gerrit.wikimedia.org/r/212041 (https://phabricator.wikimedia.org/T83580) (owner: 10Ottomata) [21:42:33] (03PS20) 10Ottomata: Add varnish request stats diamond collector [puppet] - 10https://gerrit.wikimedia.org/r/212041 (https://phabricator.wikimedia.org/T83580) [21:43:42] (03CR) 10Ottomata: Add varnish request stats diamond collector (0313 comments) [puppet] - 10https://gerrit.wikimedia.org/r/212041 (https://phabricator.wikimedia.org/T83580) (owner: 10Ottomata) [21:58:45] PROBLEM - puppet last run on sca1001 is CRITICAL puppet fail [22:03:06] PROBLEM - puppet last run on mw2072 is CRITICAL puppet fail [22:18:54] (03PS1) 10John F. Lewis: mira: add to salt_peer_run [puppet] - 10https://gerrit.wikimedia.org/r/212822 [22:18:55] RECOVERY - puppet last run on sca1001 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [22:19:15] mutante ^^ [22:20:07] JohnLewis: oh!:) looking [22:21:44] RECOVERY - puppet last run on mw2072 is OK Puppet is currently enabled, last run 48 seconds ago with 0 failures [22:25:16] PROBLEM - puppet last run on nescio is CRITICAL puppet fail [22:34:15] RECOVERY - DPKG on mira is OK: All packages OK [22:34:42] DPKG on mira is me [22:35:07] nescio is not [22:35:39] Notice: /Stage[main]/Deployment::Deployment_server/User[trebuchet]/groups: groups changed '' to 'wikidev' [22:35:42] Notice: /Stage[main]/Deployment::Deployment_server/Package[trebuchet-trigger]/ensure: ensure changed 'purged' to 'present' [22:35:45] JohnLewis: ^ [22:35:51] ori: [22:36:07] mutante: sweet [22:37:52] (03CR) 10Dzahn: [C: 032] "Notice: /Stage[main]/Deployment::Deployment_server/Package[trebuchet-trigger]/ensure: ensure changed 'purged' to 'present'" [software/deployment/trebuchet-trigger] - 10https://gerrit.wikimedia.org/r/212707 (https://phabricator.wikimedia.org/T99919) (owner: 10Dzahn) [22:38:31] (03PS9) 10Dzahn: bump version to 0.5.6-1, build for trusty [software/deployment/trebuchet-trigger] - 10https://gerrit.wikimedia.org/r/212707 (https://phabricator.wikimedia.org/T99919) [22:39:47] (03CR) 10Dzahn: [V: 032] bump version to 0.5.6-1, build for trusty [software/deployment/trebuchet-trigger] - 10https://gerrit.wikimedia.org/r/212707 (https://phabricator.wikimedia.org/T99919) (owner: 10Dzahn) [22:42:05] RECOVERY - puppet last run on nescio is OK Puppet is currently enabled, last run 33 seconds ago with 0 failures [22:43:43] 6operations, 10Deployment-Systems, 5Patch-For-Review: need package 'trebuchet-trigger' for trusty - https://phabricator.wikimedia.org/T99919#1304168 (10Dzahn) adjusted python-git dependency, built 0.5.6 for trusty, added to repo, puppet installed it on mira ``` Notice: /Stage[main]/Deployment::Deployment_... [22:46:21] (03CR) 10Hashar: [C: 032] Created .gitignore [dumps] - 10https://gerrit.wikimedia.org/r/207699 (owner: 10Dereckson) [22:47:12] 6operations, 10Deployment-Systems, 5Patch-For-Review: need package 'trebuchet-trigger' for trusty - https://phabricator.wikimedia.org/T99919#1304169 (10Dzahn) @cscott for completeness: the "import to our apt repo"-step was to scp all the files to host carbon, into /srv/wikimedia/incoming, then become root... [22:47:38] 6operations: install/deploy mira as codfw deployment server - https://phabricator.wikimedia.org/T95436#1304171 (10Dzahn) [22:47:40] 6operations, 10Deployment-Systems, 5Patch-For-Review: need package 'trebuchet-trigger' for trusty - https://phabricator.wikimedia.org/T99919#1304170 (10Dzahn) 5Open>3Resolved [22:49:00] 6operations: install/deploy mira as codfw deployment server - https://phabricator.wikimedia.org/T95436#1190079 (10Dzahn) since the blocking task above was solved, the trebuchet-trigger package is now installed [22:59:06] (03CR) 10Dzahn: [C: 032] mira: add to salt_peer_run [puppet] - 10https://gerrit.wikimedia.org/r/212822 (owner: 10John F. Lewis) [23:10:34] 6operations: install/deploy mira as codfw deployment server - https://phabricator.wikimedia.org/T95436#1304173 (10Dzahn) added to salt_peer_run per John's patch: https://gerrit.wikimedia.org/r/212822 -- remaining issue: ``` Notice: /Stage[main]/Deployment::Deployment_server/Exec[eventual_consistency_deployme... [23:17:56] (03PS2) 10Dzahn: openstack: lint fixes [puppet] - 10https://gerrit.wikimedia.org/r/211356 [23:20:25] PROBLEM - puppet last run on mw2018 is CRITICAL puppet fail [23:23:41] (03CR) 10Dzahn: [C: 032] phabricator: Add priority keywords/labels for !priority email command [puppet] - 10https://gerrit.wikimedia.org/r/209445 (https://phabricator.wikimedia.org/T98356) (owner: 10Merlijn van Deen) [23:26:45] (03CR) 10Ori.livneh: [C: 031] Add varnish request stats diamond collector [puppet] - 10https://gerrit.wikimedia.org/r/212041 (https://phabricator.wikimedia.org/T83580) (owner: 10Ottomata) [23:31:11] (03CR) 10Dzahn: [C: 032] Add mira to mediawiki-installation dsh [puppet] - 10https://gerrit.wikimedia.org/r/210938 (https://phabricator.wikimedia.org/T95436) (owner: 10John F. Lewis) [23:37:05] RECOVERY - puppet last run on mw2018 is OK Puppet is currently enabled, last run 10 seconds ago with 0 failures [23:40:33] (03PS5) 10Dzahn: dnsrecursor: ensure => 'present' rather than 'latest' [puppet] - 10https://gerrit.wikimedia.org/r/211060 (owner: 10Andrew Bogott) [23:47:15] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There are 2 unmerged changes in puppet (dir /var/lib/git/operations/puppet). [23:59:19] (03CR) 10Dzahn: [C: 032] contint: packages for Android SDK [puppet] - 10https://gerrit.wikimedia.org/r/210177 (https://phabricator.wikimedia.org/T88494) (owner: 10Hashar)