[00:02:29] hmm, uhh? [00:02:52] !log analytics1041 down, attempting power cycle [00:02:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:04:18] 6operations, 10Wikimedia-Mailing-lists: rename wikitech-announce.disabled.T100503 - https://phabricator.wikimedia.org/T109393#1547832 (10JohnLewis) a:5Dzahn>3RobH Hate to be a pain but since you're doing this already tomorrow (let's look at expanding the windows to 2 hours perhaps to be safe), could you ha... [00:05:31] 6operations, 10Wikimedia-Mailing-lists: rename wikitech-announce.disabled.T100503 - https://phabricator.wikimedia.org/T109393#1547837 (10Dzahn) fyi; [-+_.=a-z0-9] is what lists can be [00:06:02] RECOVERY - Host analytics1041 is UPING OK - Packet loss = 0%, RTA = 7.19 ms [00:06:02] RECOVERY - DPKG on analytics1041 is OK: All packages OK [00:06:03] RECOVERY - YARN NodeManager Node-State on analytics1041 is OK YARN NodeManager analytics1041.eqiad.wmnet:8041 Node-State: RUNNING [00:06:03] RECOVERY - Disk space on analytics1041 is OK: DISK OK [00:06:23] RECOVERY - salt-minion processes on analytics1041 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [00:06:23] RECOVERY - Hadoop DataNode on analytics1041 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.hdfs.server.datanode.DataNode [00:06:23] RECOVERY - puppet last run on analytics1041 is OK Puppet is currently enabled, last run 41 minutes ago with 0 failures [00:06:24] RECOVERY - Disk space on Hadoop worker on analytics1041 is OK: DISK OK [00:06:24] RECOVERY - dhclient process on analytics1041 is OK: PROCS OK: 0 processes with command name dhclient [00:06:24] RECOVERY - Hadoop NodeManager on analytics1041 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager [00:06:43] RECOVERY - RAID on analytics1041 is OK optimal, 13 logical, 14 physical [00:07:12] RECOVERY - SSH on analytics1041 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.2 (protocol 2.0) [00:07:43] RECOVERY - configured eth on analytics1041 is OK - interfaces up [00:13:18] (03PS1) 10Brion VIBBER: Remove extra transcode enablings; no longer needed [mediawiki-config] - 10https://gerrit.wikimedia.org/r/232228 [00:16:36] 6operations, 10Wikimedia-Mailing-lists: import old staff list archives ? - https://phabricator.wikimedia.org/T109395#1547848 (10Dzahn) 3NEW a:3Dzahn [00:17:21] 6operations, 10Wikimedia-Mailing-lists: import old staff list archives ? - https://phabricator.wikimedia.org/T109395#1547848 (10Dzahn) was it on purpose or by accident that archives are deleted? let's ask Philippe [00:21:44] 6operations, 10Wikimedia-Mailing-lists: import old staff list archives ? - https://phabricator.wikimedia.org/T109395#1547856 (10JohnLewis) Asked James on IRC, he's going to follow a response up. +cc Philippe and James. [00:22:07] what's the history behind staff vs. wmfall? [00:22:42] staff was the "we don't have to worry about contractors because there are basically none of them, we can just exclude them" list [00:22:58] And also the "we don't have to worry about non-SF people, there are so few of them" list [00:23:23] So eventually in 2011 or 2012 or something, they split that list into wmfsf and wmfall and expanded membership [00:24:07] Because people got fed up with the dysfunction of one group of people that complaining that they didn't get important emails because they weren't technically staff, and another group complaining that they got too many emails about birthday cakes [00:24:29] So if archives are imported, who gets access to them? [00:24:36] hah! [00:24:37] wmfall? [00:24:38] Good question [00:24:49] Another subscriber list to keep up to date? [00:24:53] Surely not just old subscribers? [00:25:15] But am I understanding correctly that contractors probably wouldn't be included? [00:25:28] The number of people that were subscribed to that list who are still around at WMF today is probably <10 [00:25:37] I would argue it should be open to the same people that are on wmfall [00:25:46] (Can I quote all of this on the ticket? :P) [00:25:50] But it's not obvious that that should be the case [00:26:15] 6operations: Add vbaranetsky@wikimedia.org to legal-tm-vio exim alias - https://phabricator.wikimedia.org/T109396#1547861 (10JKrauska) 3NEW [00:26:21] But I see on the ticket that Phillipe and James A are going to be consulted, and I trust their judgment [00:26:37] 6operations, 7Mail: Add vbaranetsky@wikimedia.org to legal-tm-vio exim alias - https://phabricator.wikimedia.org/T109396#1547868 (10Krenair) [00:27:22] RoanKattouw: yeah. they're only being consulted because we have no idea why the list is still kept today and if importing anything is of any use. [00:27:39] at most; either the list will go or those 10 people will be able to read everything they once could :) [00:28:55] RoanKattouw, wmfreqs as well I think? [00:29:07] wmfreqs is newer [00:29:20] Post-2012 I think? [00:29:20] Such a useless listinfo page [00:29:38] Or maybe it was created around the same time as the wmfall/wmfsf refactor [00:30:08] OK looks like I joined the wmfall list on 2011-08-24 [00:32:13] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL 10.00% of data above the critical threshold [500.0] [00:41:52] RECOVERY - HTTP 5xx req/min on graphite1001 is OK Less than 1.00% above the threshold [250.0] [00:45:18] 6operations, 10Wikimedia-Mailing-lists: rename wikitech-announce.disabled.T100503 - https://phabricator.wikimedia.org/T109393#1547886 (10RobH) This one should be able to be handled post-window, as its disabled. Being disabled, we won't have to halt mailman. (Chatted with John about this just now in irc.) So... [00:45:51] 6operations, 10Wikimedia-Mailing-lists: rename wikitech-announce.disabled.T100503 - https://phabricator.wikimedia.org/T109393#1547887 (10RobH) wouldnt the rename simply change the T to t, not remove it? [00:46:25] 6operations, 10Wikimedia-Mailing-lists: rename wikitech-announce.disabled.T100503 - https://phabricator.wikimedia.org/T109393#1547888 (10JohnLewis) anything works. The T just needs to go as a capital. [00:47:33] Yeah wmfreqs is from March 2013 [00:49:12] RECOVERY - check_apache2 on payments2003 is OK: PROCS OK: 6 processes with command name apache2 [00:49:12] RECOVERY - check_puppetrun on payments2003 is OK Puppet is currently enabled, last run 75 seconds ago with 0 failures [00:49:22] RECOVERY - check_apache2 on payments2002 is OK: PROCS OK: 6 processes with command name apache2 [00:49:22] RECOVERY - check_puppetrun on payments2002 is OK Puppet is currently enabled, last run 264 seconds ago with 0 failures [00:49:22] RECOVERY - check_apache2 on payments2001 is OK: PROCS OK: 6 processes with command name apache2 [00:49:23] RECOVERY - check_puppetrun on payments2001 is OK Puppet is currently enabled, last run 112 seconds ago with 0 failures [01:05:59] 6operations, 10Wikimedia-Mailing-lists: Ensure mailman VM setup has adequate entropy for STARTTLS - https://phabricator.wikimedia.org/T109239#1547906 (10Dzahn) re: hardware RNG: < Dagmar> mutante: Ask they guy running the VM host. If he starts laughing halfway through your question, that hardware isn't prese... [01:16:13] (03CR) 10Dzahn: "let's apply it on fermium only please" [puppet] - 10https://gerrit.wikimedia.org/r/231973 (https://phabricator.wikimedia.org/T82576) (owner: 10Ori.livneh) [01:22:47] 6operations, 10Wikimedia-Mailing-lists: go through all directories in /var/lib/mailman and decide if migration is needed - https://phabricator.wikimedia.org/T109399#1547911 (10Dzahn) 3NEW a:3Dzahn [01:28:57] 6operations, 10Wikimedia-Mailing-lists: go through all directories in /var/lib/mailman and decide if migration is needed - https://phabricator.wikimedia.org/T109399#1547920 (10JohnLewis) archives; yes bin; no cgi-bin; no cron; no data; to my knowledge - not necessary. Just stored site password, list creator pa... [02:05:11] 6operations, 10Wikimedia-Mailing-lists: import old staff list archives ? - https://phabricator.wikimedia.org/T109395#1547932 (10Dzahn) note to self: sed -e '/^[ TABKEY]/H; x; /^Received:/!p; $!d; x; p' [02:05:22] PROBLEM - Disk space on labstore1002 is CRITICAL: DISK CRITICAL - /run/lock/storage-replicate-labstore-tools/snapshot is not accessible: Permission denied [02:21:51] !log l10nupdate@tin Synchronized php-1.26wmf18/cache/l10n: l10nupdate for 1.26wmf18 (duration: 06m 50s) [02:21:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:25:28] !log l10nupdate@tin LocalisationUpdate completed (1.26wmf18) at 2015-08-18 02:25:28+00:00 [02:25:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:26:28] 6operations, 7Database, 5Patch-For-Review: install/setup/deploy db2043-db2070 - https://phabricator.wikimedia.org/T96383#1547946 (10Krenair) [02:26:53] 6operations, 7Database: install/setup/deploy db2043-db2070 - https://phabricator.wikimedia.org/T96383#1547949 (10Krenair) [02:45:36] 6operations, 10Traffic: upload.wikimedia.org still using old 404 error page - https://phabricator.wikimedia.org/T37053#1547958 (10Krenair) [03:41:35] 6operations, 7Database: dbtree shows 0 lag for db1047 - https://phabricator.wikimedia.org/T109401#1548001 (10Krenair) 3NEW [04:17:44] 6operations, 10Traffic, 7Easy: upload.wikimedia.org still using old 404 error page - https://phabricator.wikimedia.org/T37053#1548046 (10MZMcBride) [04:23:43] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL 6.67% of data above the critical threshold [500.0] [04:32:38] 7Blocked-on-Operations, 6operations, 10Traffic: upload.wikimedia.org still using old 404 error page - https://phabricator.wikimedia.org/T37053#1548056 (10ori) p:5Normal>3High [04:35:22] RECOVERY - HTTP 5xx req/min on graphite1001 is OK Less than 1.00% above the threshold [250.0] [04:46:42] PROBLEM - Incoming network saturation on labstore1003 is CRITICAL 11.54% of data above the critical threshold [100000000.0] [04:56:56] 6operations, 7Mail: Add vbaranetsky@wikimedia.org to legal-tm-vio exim alias - https://phabricator.wikimedia.org/T109396#1548084 (10Dzahn) p:5Triage>3High [04:57:01] 6operations, 7Mail: Add vbaranetsky@wikimedia.org to legal-tm-vio exim alias - https://phabricator.wikimedia.org/T109396#1548086 (10Dzahn) a:3Dzahn [04:59:55] 6operations, 7Mail: Add vbaranetsky@wikimedia.org to legal-tm-vio exim alias - https://phabricator.wikimedia.org/T109396#1548087 (10Dzahn) per https://wikimediafoundation.org/wiki/User:VBaranetsky_%28WMF%29 done --- -legal-tm-vio: slaporte, ywelinder, rstallman, mbrar, jrogers, kfrancis +legal-tm-vio: s... [05:00:19] 6operations, 7Mail: Add vbaranetsky@wikimedia.org to legal-tm-vio exim alias - https://phabricator.wikimedia.org/T109396#1548088 (10Dzahn) 5Open>3Resolved [05:36:02] (03CR) 10Mattflaschen: "I work(ed) on both of these, and this makes sense to me." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/229197 (https://phabricator.wikimedia.org/T107927) (owner: 10Aude) [05:36:33] RECOVERY - Incoming network saturation on labstore1003 is OK Less than 10.00% above the threshold [75000000.0] [05:45:47] !log l10nupdate@tin ResourceLoader cache refresh completed at Tue Aug 18 05:45:47 UTC 2015 (duration 45m 46s) [05:45:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [06:31:32] PROBLEM - puppet last run on cp2002 is CRITICAL Puppet has 2 failures [06:32:12] PROBLEM - puppet last run on mw2043 is CRITICAL Puppet has 1 failures [06:32:22] PROBLEM - puppet last run on mw2129 is CRITICAL Puppet has 1 failures [06:32:23] PROBLEM - puppet last run on cp2013 is CRITICAL Puppet has 1 failures [06:32:23] PROBLEM - puppet last run on db1045 is CRITICAL Puppet has 1 failures [06:33:02] PROBLEM - puppet last run on mw1119 is CRITICAL Puppet has 1 failures [06:33:12] PROBLEM - puppet last run on mw2158 is CRITICAL Puppet has 1 failures [06:55:04] RECOVERY - puppet last run on db1045 is OK Puppet is currently enabled, last run 3 seconds ago with 0 failures [06:56:12] RECOVERY - puppet last run on cp2002 is OK Puppet is currently enabled, last run 40 seconds ago with 0 failures [06:57:02] RECOVERY - puppet last run on cp2013 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:42] RECOVERY - puppet last run on mw1119 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:58:54] RECOVERY - puppet last run on mw2129 is OK Puppet is currently enabled, last run 11 seconds ago with 0 failures [06:59:54] RECOVERY - puppet last run on mw2158 is OK Puppet is currently enabled, last run 5 seconds ago with 0 failures [07:17:51] (03PS1) 10Gergő Tisza: Create interfaceeditor group for Hungarian Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/232241 (https://phabricator.wikimedia.org/T109408) [07:23:53] !log live hacking on mw1017 for T109236 [07:23:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [07:27:32] RECOVERY - puppet last run on mw2043 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [07:36:09] (03PS3) 10Giuseppe Lavagetto: service: add deployment_script define [puppet] - 10https://gerrit.wikimedia.org/r/231790 [07:36:22] PROBLEM - puppet last run on cp3015 is CRITICAL puppet fail [08:00:39] 6operations, 10MediaWiki-extensions-TimedMediaHandler, 6Multimedia, 7HHVM: Convert tmh100[12] to HHVM and trusty - https://phabricator.wikimedia.org/T104747#1548221 (10Joe) a:3Joe [08:01:23] RECOVERY - puppet last run on cp3015 is OK Puppet is currently enabled, last run 9 seconds ago with 0 failures [08:02:18] (03PS1) 10Giuseppe Lavagetto: videoscaler: reimage mw1152 as an experimental videoscaler [puppet] - 10https://gerrit.wikimedia.org/r/232242 (https://phabricator.wikimedia.org/T104747) [08:02:36] <_joe_> godog: ^^ I think we're GTG right? [08:03:37] _joe_: double checking but I believe so [08:03:57] !log restart cassandra on restbase100[348] to pick up latest openjdk [08:04:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:04:20] <_joe_> !log depooling mw1152 from the imagescalers pool [08:04:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:05:37] (03CR) 10Filippo Giunchedi: [C: 031] videoscaler: reimage mw1152 as an experimental videoscaler [puppet] - 10https://gerrit.wikimedia.org/r/232242 (https://phabricator.wikimedia.org/T104747) (owner: 10Giuseppe Lavagetto) [08:14:39] !log restart cassandra on restbase100[569] to pick up latest openjdk [08:14:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:16:04] 6operations, 10RESTBase-Cassandra, 7Blocked-on-Services: upgrade to latest openjdk 8 8u66-b01-1 - https://phabricator.wikimedia.org/T104888#1548232 (10fgiunchedi) [08:18:42] <_joe_> !log reimaging mw1152 [08:18:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:19:44] (03CR) 10Giuseppe Lavagetto: [C: 032] videoscaler: reimage mw1152 as an experimental videoscaler [puppet] - 10https://gerrit.wikimedia.org/r/232242 (https://phabricator.wikimedia.org/T104747) (owner: 10Giuseppe Lavagetto) [08:20:32] 6operations, 10RESTBase-Cassandra, 7Blocked-on-Services: upgrade to latest openjdk 8 8u66-b01-1 - https://phabricator.wikimedia.org/T104888#1548241 (10fgiunchedi) 5Open>3Resolved production cluster has been upgraded to `8u66-b01-1~bpo8+1` [08:21:01] _joe_: \0/ !! [08:23:52] PROBLEM - puppet last run on mira is CRITICAL Puppet has 1 failures [08:50:05] RECOVERY - puppet last run on mira is OK Puppet is currently enabled, last run 2 seconds ago with 0 failures [09:11:04] PROBLEM - puppet last run on db2010 is CRITICAL puppet fail [09:13:10] 6operations, 6Commons, 10MediaWiki-File-management, 10MediaWiki-Tarball-Backports, and 7 others: InstantCommons broken by switch to HTTPS - https://phabricator.wikimedia.org/T102566#1548300 (10Tau) >>! In T102566#1541433, @Tgr wrote: > I'll just upload the correct files then: {F1496921} > > {F1496922} I... [09:22:55] PROBLEM - puppet last run on mw1152 is CRITICAL Puppet has 3 failures [09:24:55] RECOVERY - puppet last run on mw1152 is OK Puppet is currently enabled, last run 45 seconds ago with 0 failures [09:31:14] 6operations, 7Mail: Remove Alias for sj@wm.o - https://phabricator.wikimedia.org/T108276#1548344 (10MoritzMuehlenhoff) Can you specify what exactly you want to have changed? Currently sklein is a redirect to meta.sj@gmail.com and sf, samuel and sam are configured as aliases. Do you want to have the redirect... [09:32:01] 6operations: Monitor failing ferm restarts - https://phabricator.wikimedia.org/T108303#1548346 (10MoritzMuehlenhoff) p:5Triage>3Normal a:3MoritzMuehlenhoff [09:35:03] 6operations, 10RESTBase-Cassandra, 5Patch-For-Review: better cassandra process checks - https://phabricator.wikimedia.org/T108306#1548358 (10fgiunchedi) [09:35:06] 6operations, 10RESTBase, 10RESTBase-Cassandra, 5Patch-For-Review: Test multiple Cassandra instances per hardware node - https://phabricator.wikimedia.org/T95253#1548359 (10fgiunchedi) [09:36:06] (03CR) 10Filippo Giunchedi: "I believe it would since service units with a corresponding sysv init script get auto generated, however the multi instance work would bri" [puppet] - 10https://gerrit.wikimedia.org/r/230066 (https://phabricator.wikimedia.org/T108306) (owner: 10Filippo Giunchedi) [09:36:13] (03CR) 10Faidon Liambotis: "Manually? Nooooo" [puppet] - 10https://gerrit.wikimedia.org/r/232097 (https://phabricator.wikimedia.org/T106581) (owner: 10Ottomata) [09:36:44] RECOVERY - puppet last run on db2010 is OK Puppet is currently enabled, last run 3 seconds ago with 0 failures [09:37:07] 6operations, 10MediaWiki-extensions-TimedMediaHandler, 6Multimedia, 10Wikimedia-Video: Backport libtheora 1.2.0alpha package to Trusty - https://phabricator.wikimedia.org/T109207#1548374 (10MoritzMuehlenhoff) a:3fgiunchedi [09:37:20] (03PS3) 10Yuvipanda: Tools: Check permissions for error.log in webservice [puppet] - 10https://gerrit.wikimedia.org/r/231564 (https://phabricator.wikimedia.org/T99576) (owner: 10Tim Landscheidt) [09:37:28] (03CR) 10Yuvipanda: [C: 032 V: 032] Tools: Check permissions for error.log in webservice [puppet] - 10https://gerrit.wikimedia.org/r/231564 (https://phabricator.wikimedia.org/T99576) (owner: 10Tim Landscheidt) [09:41:59] (03PS1) 10Faidon Liambotis: smokeping: temporarily remove cr1-eqord/cr1-eqdfw [puppet] - 10https://gerrit.wikimedia.org/r/232249 [09:42:18] 10Ops-Access-Requests, 6operations: Grant ebernhardson access to stat1002 to query hive - https://phabricator.wikimedia.org/T109356#1548396 (10MoritzMuehlenhoff) a:3MoritzMuehlenhoff [09:42:31] 10Ops-Access-Requests, 6operations: Grant SMalyshev access to stat1002 to query hive - https://phabricator.wikimedia.org/T109357#1548397 (10MoritzMuehlenhoff) a:3MoritzMuehlenhoff [09:46:06] (03CR) 10Faidon Liambotis: [C: 032] smokeping: temporarily remove cr1-eqord/cr1-eqdfw [puppet] - 10https://gerrit.wikimedia.org/r/232249 (owner: 10Faidon Liambotis) [09:51:14] 6operations, 5Patch-For-Review: Firewall configurations for database hosts - https://phabricator.wikimedia.org/T104699#1548408 (10jcrespo) The snapshot hosts needs mysql access to production hosts (in theory, only to snapshot hosts, but the configuration is not on puppet, but in mediawiki-config). Snapshot hos... [09:51:44] PROBLEM - puppet last run on ruthenium is CRITICAL puppet fail [09:54:02] 6operations, 7Database: duplicate key error on db1056 - https://phabricator.wikimedia.org/T108033#1548410 (10jcrespo) 5Open>3Resolved Resolved individual drift issues. Long term issues will be resolved on T104459 and T109179. [09:54:43] 6operations, 6Phabricator: Create an offboarding workflow with HR & Operations - https://phabricator.wikimedia.org/T108131#1548425 (10Qgil) Isn't it possible to create a Phabricator task via email? Even to a private Space? Maybe we can think of a scenario where HR sends their regular email to N addresses, one... [10:17:14] RECOVERY - puppet last run on ruthenium is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [10:18:11] 6operations, 10MediaWiki-extensions-TimedMediaHandler, 6Multimedia, 7HHVM, 5Patch-For-Review: Convert tmh100[12] to HHVM and trusty - https://phabricator.wikimedia.org/T104747#1548502 (10Joe) mw1152 is successfully reimaged as a videoscaler, and the jobrunner is stopped by default. @brion any suggestions... [10:54:06] PROBLEM - puppet last run on cp3009 is CRITICAL puppet fail [10:56:32] 6operations: Update wikimedia apt repo to include debs for shiny-server - https://phabricator.wikimedia.org/T106435#1548554 (10JanZerebecki) [10:58:17] 6operations: Determine Sam Reed's access rights - https://phabricator.wikimedia.org/T109386#1548558 (10Aklapper) [11:19:45] RECOVERY - puppet last run on cp3009 is OK Puppet is currently enabled, last run 15 seconds ago with 0 failures [11:49:03] 6operations, 6Services: reinstall OCG servers - https://phabricator.wikimedia.org/T84723#1548644 (10Aklapper) "Unbreak Now!" priority ("[[ https://www.mediawiki.org/wiki/Phabricator/Project_management#Setting_task_priorities | needs to be fixed immediately, setting anything else aside ]]") for six weeks. @Dz... [11:53:14] PROBLEM - puppet last run on ruthenium is CRITICAL Puppet has 3 failures [12:16:45] RECOVERY - puppet last run on ruthenium is OK Puppet is currently enabled, last run 42 seconds ago with 0 failures [12:39:33] 6operations, 6Phabricator: Create an offboarding workflow with HR & Operations - https://phabricator.wikimedia.org/T108131#1548699 (10Aklapper) >>! In T108131#1548425, @Qgil wrote: > Isn't it possible to create a Phabricator task via email? Yes: https://www.mediawiki.org/wiki/Phabricator/Help#Using_e-mail >... [13:20:44] (03PS1) 10Giuseppe Lavagetto: Do not leave stale locks on system exit or keyboard interrupt [software/conftool] - 10https://gerrit.wikimedia.org/r/232264 [13:22:23] (03CR) 10Giuseppe Lavagetto: [C: 032] Do not leave stale locks on system exit or keyboard interrupt [software/conftool] - 10https://gerrit.wikimedia.org/r/232264 (owner: 10Giuseppe Lavagetto) [13:26:05] (03CR) 10Mobrovac: [C: 04-1] service: add deployment_script define (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/231790 (owner: 10Giuseppe Lavagetto) [13:27:02] (03PS1) 10Giuseppe Lavagetto: pybal: revert use of confd-generated files in codfw [puppet] - 10https://gerrit.wikimedia.org/r/232265 [13:29:25] PROBLEM - puppet last run on cp3017 is CRITICAL puppet fail [13:30:16] <_joe_> mobrovac: thanks for reviewing it :) [13:31:02] np _joe_ [13:31:21] i'm willing to share the blame [13:31:22] :) [13:32:05] <_joe_> ahah come on :) [13:32:14] <_joe_> we must test this in beta first, ofc [13:36:34] PROBLEM - puppet last run on rdb2004 is CRITICAL puppet fail [13:46:36] (03CR) 10BBlack: [C: 031] pybal: revert use of confd-generated files in codfw [puppet] - 10https://gerrit.wikimedia.org/r/232265 (owner: 10Giuseppe Lavagetto) [13:46:38] 6operations, 6Phabricator: Create an offboarding workflow with HR & Operations - https://phabricator.wikimedia.org/T108131#1548854 (10mark) >>! In T108131#1548425, @Qgil wrote: > Isn't it possible to create a Phabricator task via email? Even to a private Space? Maybe we can think of a scenario where HR sends t... [13:56:46] RECOVERY - puppet last run on cp3017 is OK Puppet is currently enabled, last run 25 seconds ago with 0 failures [14:02:04] RECOVERY - puppet last run on rdb2004 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [14:14:07] 6operations, 10RESTBase-Cassandra: upgrade to latest openjdk 8 8u66-b01-1 - https://phabricator.wikimedia.org/T104888#1548954 (10GWicke) [14:14:40] 6operations, 10RESTBase-Cassandra: upgrade to latest openjdk 8 8u66-b01-1 - https://phabricator.wikimedia.org/T104888#1431055 (10GWicke) Thank you, @MoritzMuehlenhoff and @fgiunchedi! [14:23:15] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL Anomaly detected: 10 data above and 2 below the confidence bounds [14:41:17] (03CR) 10Giuseppe Lavagetto: [C: 032] pybal: revert use of confd-generated files in codfw [puppet] - 10https://gerrit.wikimedia.org/r/232265 (owner: 10Giuseppe Lavagetto) [14:42:55] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL Anomaly detected: 10 data above and 3 below the confidence bounds [14:44:42] 6operations: Determine Sam Reed's access rights - https://phabricator.wikimedia.org/T109386#1549076 (10Dzahn) [14:52:19] 6operations, 6Phabricator: Create an offboarding workflow with HR & Operations - https://phabricator.wikimedia.org/T108131#1549099 (10chasemp) We should be able to do the private task via email with some of our existing custom logic. We were basically intercepting emails to a specific address in some cases an... [14:52:51] 6operations: Determine Sam Reed's access rights - https://phabricator.wikimedia.org/T109386#1549100 (10Dzahn) a:5Reedy>3RobH Confirmed that Reedy signed L2 and closed blocking task. What else is here to do? [14:54:14] 6operations, 7Mail: Remove Alias for sj@wm.o - https://phabricator.wikimedia.org/T108276#1549103 (10Dzahn) I would highly suggest to first get an ACK from Samuel. [14:56:28] (03CR) 10Ottomata: "Its only for the reinstalls. We want to keep the data directories around, and don't want to risk partman messing stuff up. I will put th" [puppet] - 10https://gerrit.wikimedia.org/r/232097 (https://phabricator.wikimedia.org/T106581) (owner: 10Ottomata) [14:58:54] 6operations: Run assert check to verify the existence of certain texts in the footer - https://phabricator.wikimedia.org/T108081#1549118 (10chasemp) [14:59:11] 10Ops-Access-Requests, 6operations: Grant SMalyshev access to stat1002 to query hive - https://phabricator.wikimedia.org/T109357#1549120 (10Dzahn) Access request tickets should have some kind of reasoning why access is needed. It can be minimal, just a few words, but could you add that? Thanks! [14:59:46] 6operations: Update wikimedia apt repo to include debs for shiny-server - https://phabricator.wikimedia.org/T106435#1549122 (10EBernhardson) 5Open>3declined a:3EBernhardson wrote some shell scripts, will see how they work out. [15:00:05] anomie ostriches thcipriani marktraceur Krenair: Dear anthropoid, the time has come. Please deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150818T1500). [15:00:12] 10Ops-Access-Requests, 6operations: Grant SMalyshev access to stat1002 to query hive - https://phabricator.wikimedia.org/T109357#1549132 (10Smalyshev) I need to access to statistics so I can monitor usage of WDQS and once ElasticSearch statistics also moved there, that statistics too. [15:01:07] I can SWAT— James_F ping! [15:01:11] Heya. [15:01:19] (It didn't ping me?) [15:01:42] yeah, I think there must be some kind of time window after which, if you add something, you don't get pinged. [15:01:54] that seems to be what happens in practice anyway [15:02:06] Ah well. [15:02:08] Hey :-) [15:03:48] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/232228 (owner: 10Brion VIBBER) [15:03:55] (03Merged) 10jenkins-bot: Remove extra transcode enablings; no longer needed [mediawiki-config] - 10https://gerrit.wikimedia.org/r/232228 (owner: 10Brion VIBBER) [15:04:05] !log rebooting labvirt1006 [15:04:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master