[00:00:04] <jouncebot>	 twentyafterfour: Dear anthropoid, the time has come. Please deploy Phabricator update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160526T0000).
[00:01:26] <Dereckson>	 Completed at 2016-05-25 23:59:29+00:00. Copying LC files to /srv/mediawiki-staging
[00:01:30] <Dereckson>	 00:01:18 Updated 392 JSON file(s) in /srv/mediawiki-staging/php-1.28.0-wmf.3/cache/l10n
[00:01:36] <Dereckson>	 syncing
[00:07:11] <grrrit-wm>	 (03PS3) 10Dzahn: logging: move files/misc/demux.py to modules/udp2log [puppet] - 10https://gerrit.wikimedia.org/r/289353 
[00:07:20] <grrrit-wm>	 (03CR) 10Dzahn: [C: 032] "http://puppet-compiler.wmflabs.org/2922/" [puppet] - 10https://gerrit.wikimedia.org/r/289353 (owner: 10Dzahn)
[00:08:08] <icinga-wm>	 PROBLEM - cassandra-b CQL 10.192.48.47:9042 on restbase2005 is CRITICAL: Connection refused
[00:08:43] <urandom>	 ^^^ got it
[00:09:20] <icinga-wm>	 ACKNOWLEDGEMENT - cassandra-b CQL 10.192.48.47:9042 on restbase2005 is CRITICAL: Connection refused eevans Node is bootstrapping. - The acknowledgement expires at: 2016-05-28 00:08:58.
[00:09:37] <mutante>	 :)
[00:15:01] <grrrit-wm>	 (03PS1) 10Eevans: filter out new metrics [puppet] - 10https://gerrit.wikimedia.org/r/290860 
[00:16:49] <wikibugs>	 06Operations, 06Performance-Team, 10Thumbor: Package and backport Thumbor dependencies in Debian - https://phabricator.wikimedia.org/T134485#2328751 (10Gilles)
[00:17:35] <logmsgbot>	 !log dereckson@tin scap sync-l10n completed (1.28.0-wmf.3) (duration: 16m 15s)
[00:17:44] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[00:18:08] * Dereckson soupire
[00:18:34] <Dereckson>	 bd808: I've a lot of permission denied like:
[00:18:37] <Dereckson>	 aywikibooks:  [Thu May 26 00:18:26 2016] [hphp] [5591:7f2ad124f100:0:000001] [] 
[00:18:40] <Dereckson>	 aywikibooks:  Warning: rename(): Permission denied in /srv/mediawiki/wmf-config/CommonSettings.php on line 189
[00:18:50] <bd808>	 yeah. I think those are ok
[00:18:55] <Dereckson>	 k
[00:19:07] <bd808>	 We should open a bug and make sure though
[00:19:28] <bd808>	 something is running as the wrong user. those should all be owned by www-data
[00:20:07] <bd808>	 It may be just a quirk on tin from some other process
[00:20:38] <grrrit-wm>	 (03PS2) 10Dzahn: let chromium use jessie installer [puppet] - 10https://gerrit.wikimedia.org/r/290347 
[00:22:24] <grrrit-wm>	 (03CR) 10Dzahn: [C: 032] let chromium use jessie installer [puppet] - 10https://gerrit.wikimedia.org/r/290347 (owner: 10Dzahn)
[00:22:36] <Dereckson>	 Filled as https://phabricator.wikimedia.org/T136258
[00:22:42] <wikibugs>	 06Operations, 10Deployment-Systems, 03Scap3: Warning: rename(): Permission denied in /srv/mediawiki/wmf-config/CommonSettings.php on line 189 - https://phabricator.wikimedia.org/T136258#2328753 (10Dereckson)
[00:27:50] <logmsgbot>	 !log l10nupdate@tin ResourceLoader cache refresh completed at Thu May 26 00:27:50 UTC 2016 (duration 10m 15s)
[00:27:59] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[00:29:02] <Dereckson>	 All done
[00:29:03] <Dereckson>	 [dereckson@tin ~]$ 
[00:31:39] <Dereckson>	 bd808: https://commons.wikimedia.org/wiki/MediaWiki:Group-wmf-supportsafety-member: the string hasn't been imported, would it be possible this process updates only *existing strings* but doesn't import new strings?
[00:32:09] <Dereckson>	 We expected to get new strings from https://gerrit.wikimedia.org/r/#/c/290581/
[00:32:25] <bd808>	 Dereckson: hmmm... I don't remember honestly
[00:35:38] <bd808>	 Dereckson: I'm not sure that wmf-messages is setup to actually read those keys -- https://phabricator.wikimedia.org/diffusion/EWME/browse/master/WikimediaMessages.hooks.php
[00:35:53] <bd808>	 that extension does weird stuff
[00:35:55] <wikibugs>	 06Operations, 07Puppet, 10Beta-Cluster-Infrastructure: Hiera hierarchy hieradata/role/* is not applied on labs (eg  deployment-prep) - https://phabricator.wikimedia.org/T136080#2322414 (10scfc) Is this task a duplicate of T120165?
[00:37:17] <Dereckson>	 bd808: k I'm backporting it to php-1.28.0-wmf.3
[00:37:48] <wikibugs>	 06Operations, 07Puppet, 10Beta-Cluster-Infrastructure: Hiera hierarchy hieradata/role/* is not applied on labs (eg  deployment-prep) - https://phabricator.wikimedia.org/T136080#2322414 (10Dzahn) yes, i think that's a duplicate. a real "merge" of the content is still being missed for these cases
[00:37:53] <bd808>	 if the extension isn't actually grabbing them in the onMessageCacheGet hook that won't help
[00:38:00] <bd808>	 have you tested on beta cluster?
[00:38:31] <Dereckson>	 hmmm I thought these keys were for a special handling.
[00:39:06] <Dereckson>	 hey Krenair how did you deploy the new name for tne en.wiki extendedconfirmed group?
[00:39:45] <bd808>	 Dereckson: you may be right. the message for https://commons.wikimedia.org/wiki/MediaWiki:Group-wmf-researcher-member is there
[00:40:00] <Dereckson>	 Oh
[00:40:03] <icinga-wm>	 PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet, ref HEAD..origin/production).
[00:40:05] <Dereckson>	 so it worked, but it was only a cache issue
[00:40:07] <Dereckson>	 good news
[00:40:18] <bd808>	 that is an older message in the file
[00:40:28] <Dereckson>	 ah
[00:40:41] <wikibugs>	 06Operations, 07Puppet, 06Labs: Implement role based hiera lookups for labs - https://phabricator.wikimedia.org/T120165#1847021 (10Dzahn) https://wikitech.wikimedia.org/wiki/Puppet_Hiera#Role-based_lookup  It's that "the new parser function/keyword, called role" is something that we (Joe) made ourself and do...
[00:40:45] <icinga-wm>	 PROBLEM - Unmerged changes on repository puppet on palladium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet, ref HEAD..origin/production).
[00:41:10] * Dereckson checkes the deployments archive.
[00:45:41] <Dereckson>	 The last similar config changes with new messages, we didn't deploy WikimediaMessages, only merged to master
[00:46:27] <grrrit-wm>	 (03PS2) 10Dzahn: snapshot: one file per role class, move to modules/role [puppet] - 10https://gerrit.wikimedia.org/r/286165 
[00:48:22] <Dereckson>	 https://wikitech.wikimedia.org/wiki/Deployments/Archive/2015/01#deploycal-item-20150106T0000
[00:48:33] <Dereckson>	 aude did a backport
[00:49:03] <Dereckson>	 so apparently yes, it's the right process
[00:50:27] <Dereckson>	 AaronSchulz:     Use correct module name for stats in executeActionWithErrorHandling()
[00:51:09] <Dereckson>	 AaronSchulz: this is an undeployed change on wmf3 branch
[00:51:28] <Dereckson>	 https://gerrit.wikimedia.org/r/#/c/290836/
[00:51:30] <AaronSchulz>	 I was starring at the WikimediaMessages thing that was in HEAD..origin/wmf
[00:52:27] <logmsgbot>	 !log aaron@tin Synchronized php-1.28.0-wmf.3/includes/api/ApiMain.php: 01e68e966413c (duration: 00m 29s)
[00:52:37] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[00:53:36] <wikibugs>	 06Operations, 10Deployment-Systems, 03Scap3: Warning: rename(): Permission denied in /srv/mediawiki/wmf-config/CommonSettings.php on line 189 - https://phabricator.wikimedia.org/T136258#2328799 (10bd808) These are temp files for the WM globals cache. The files in /tmp are being created with l10nupdate:l10nup...
[00:57:51] <logmsgbot>	 !log dereckson@tin Synchronized php-1.28.0-wmf.3/extensions/WikimediaMessages/i18n/wikimedia: Add i18n messages for new Support and Safety group (duration: 00m 26s)
[00:58:01] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[00:58:56] <Dereckson>	 Okay now 2am l10nupdate l10n cache refresh should propagate the l10n key.
[00:59:43] * foks pokes head in
[01:03:02] <Dereckson>	 foks: TL;DR for *existing* messages, we've a working job to propagate new changes, it seems it doesn't work for *NEW* message, I checked in deployment/server admin log how previous similar WikimediaMessages have been handled and did the same: cherry-pick to current wmf branch, sync i18n folder. In 30 minutes, l10nupdate will refresh the cache according the keys in files.
[01:03:35] <Dereckson>	 foks: so when l10nupdate will have finished for wmf3 (not wmf2) and reported here it's done, you can test at https://commons.wikimedia.org/wiki/MediaWiki:Group-wmf-supportsafety
[01:03:57] <foks>	 Ah okay
[01:21:27] <wikibugs>	 06Operations, 10Analytics, 06Performance-Team, 10Traffic: A/B Testing solid framework - https://phabricator.wikimedia.org/T135762#2328835 (10Jdlrobson)
[01:23:20] <grrrit-wm>	 (03PS3) 10Dzahn: snapshot: one file per role class, move to modules/role [puppet] - 10https://gerrit.wikimedia.org/r/286165 
[01:24:20] <wikibugs>	 06Operations, 10ops-eqiad, 10fundraising-tech-ops: Rack and setup Fundraising DB - https://phabricator.wikimedia.org/T136200#2328837 (10Jgreen)
[01:26:24] <grrrit-wm>	 (03PS1) 10BryanDavis: foreachwikiindblist: Fix sudo guard and cleanup script [puppet] - 10https://gerrit.wikimedia.org/r/290863 (https://phabricator.wikimedia.org/T136258) 
[01:27:23] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] foreachwikiindblist: Fix sudo guard and cleanup script [puppet] - 10https://gerrit.wikimedia.org/r/290863 (https://phabricator.wikimedia.org/T136258) (owner: 10BryanDavis)
[01:28:02] <grrrit-wm>	 (03PS2) 10BryanDavis: foreachwikiindblist: Fix sudo guard and cleanup script [puppet] - 10https://gerrit.wikimedia.org/r/290863 (https://phabricator.wikimedia.org/T136258) 
[01:28:45] <grrrit-wm>	 (03PS4) 10Dzahn: snapshot: one file per role class, move to modules/role [puppet] - 10https://gerrit.wikimedia.org/r/286165 
[01:29:48] <grrrit-wm>	 (03PS5) 10Dzahn: snapshot: one file per role class, move to modules/role [puppet] - 10https://gerrit.wikimedia.org/r/286165 
[01:30:18] <grrrit-wm>	 (03CR) 10Dzahn: [C: 032] snapshot: one file per role class, move to modules/role [puppet] - 10https://gerrit.wikimedia.org/r/286165 (owner: 10Dzahn)
[01:33:22] <icinga-wm>	 RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge.
[01:35:21] <icinga-wm>	 RECOVERY - Unmerged changes on repository puppet on palladium is OK: No changes to merge.
[01:40:17] <grrrit-wm>	 (03PS1) 10Dzahn: snapshot: follow-up to move role classes [puppet] - 10https://gerrit.wikimedia.org/r/290864 
[01:40:32] <icinga-wm>	 PROBLEM - puppet last run on snapshot1003 is CRITICAL: CRITICAL: puppet fail
[01:42:25] <grrrit-wm>	 (03PS2) 10Dzahn: snapshot: follow-up to move role classes [puppet] - 10https://gerrit.wikimedia.org/r/290864 
[01:45:23] <grrrit-wm>	 (03PS3) 10Dzahn: snapshot: follow-up to move role classes [puppet] - 10https://gerrit.wikimedia.org/r/290864 
[01:45:32] <grrrit-wm>	 (03CR) 10Dzahn: [C: 032] snapshot: follow-up to move role classes [puppet] - 10https://gerrit.wikimedia.org/r/290864 (owner: 10Dzahn)
[01:47:22] <mutante>	 !log mw1249 - restart hhvm
[01:47:33] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[01:48:01] <icinga-wm>	 RECOVERY - Apache HTTP on mw1249 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 626 bytes in 0.064 second response time
[01:48:11] <icinga-wm>	 RECOVERY - HHVM rendering on mw1249 is OK: HTTP OK: HTTP/1.1 200 OK - 67693 bytes in 0.222 second response time
[01:49:33] <grrrit-wm>	 (03CR) 10Dzahn: "checked on every single snapshot host.. after https://gerrit.wikimedia.org/r/#/c/290864/ all is unchanged" [puppet] - 10https://gerrit.wikimedia.org/r/286165 (owner: 10Dzahn)
[01:50:21] <icinga-wm>	 RECOVERY - puppet last run on snapshot1003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[01:51:09] <grrrit-wm>	 (03CR) 10Dzahn: "i don't remember the WIP part meanwhile.. been too long" [software] - 10https://gerrit.wikimedia.org/r/276890 (owner: 10Dzahn)
[01:55:30] <grrrit-wm>	 (03PS2) 10Dzahn: syslog: move role class to autoloader layout [puppet] - 10https://gerrit.wikimedia.org/r/286164 
[01:55:51] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] syslog: move role class to autoloader layout [puppet] - 10https://gerrit.wikimedia.org/r/286164 (owner: 10Dzahn)
[01:56:23] <grrrit-wm>	 (03PS3) 10Dzahn: syslog: move role class to autoloader layout [puppet] - 10https://gerrit.wikimedia.org/r/286164 
[01:57:55] <grrrit-wm>	 (03CR) 10Dzahn: [C: 032] syslog: move role class to autoloader layout [puppet] - 10https://gerrit.wikimedia.org/r/286164 (owner: 10Dzahn)
[02:01:05] <wikibugs>	 06Operations, 13Patch-For-Review: install font packages on all appservers, not just imagescalers (was: Install fonts-wqy-zenhei on all mediawiki app servers) - https://phabricator.wikimedia.org/T84777#2328857 (10Dzahn) @joe If "Timelines aren't rendered on image scalers. They're rendered on standard mediawiki...
[02:24:39] <logmsgbot>	 !log mwdeploy@tin scap sync-l10n completed (1.28.0-wmf.2) (duration: 10m 33s)
[02:24:50] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[02:56:18] <logmsgbot>	 !log mwdeploy@tin scap sync-l10n completed (1.28.0-wmf.3) (duration: 15m 49s)
[02:56:31] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[03:05:45] <logmsgbot>	 !log l10nupdate@tin ResourceLoader cache refresh completed at Thu May 26 03:05:45 UTC 2016 (duration 9m 27s)
[03:05:58] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[04:12:12] <grrrit-wm>	 (03PS1) 10BryanDavis: CommonSettings: cleanup temp cache file if rename fails [mediawiki-config] - 10https://gerrit.wikimedia.org/r/290867 (https://phabricator.wikimedia.org/T136258) 
[04:17:20] <grrrit-wm>	 (03PS7) 10Ori.livneh: Convert mwgrep to use regexp by default [puppet] - 10https://gerrit.wikimedia.org/r/283107 (owner: 10EBernhardson)
[04:17:31] <grrrit-wm>	 (03CR) 10Ori.livneh: [C: 032 V: 032] Convert mwgrep to use regexp by default [puppet] - 10https://gerrit.wikimedia.org/r/283107 (owner: 10EBernhardson)
[04:29:37] <wikibugs>	 06Operations, 10Deployment-Systems, 13Patch-For-Review, 03Scap3: Warning: rename(): Permission denied in /srv/mediawiki/wmf-config/CommonSettings.php on line 189 - https://phabricator.wikimedia.org/T136258#2328938 (10bd808) Temporary files left by l10nupdate failures can be cleaned up with: ``` sudo -u l10...
[04:43:01] <icinga-wm>	 PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL: replication_delay is 633 600 - REDIS on 10.192.48.44:6479 has 1 databases (db0) with 6476307 keys - replication_delay is 633
[04:47:15] <wikibugs>	 06Operations, 06Discovery, 10Maps, 10Tilerator, 03Discovery-Maps-Sprint: water_polygons import is broken - https://phabricator.wikimedia.org/T112831#1647374 (10Yurik) @maxsem, is this still relevant?
[04:49:51] <grrrit-wm>	 (03PS1) 10Ori.livneh: grafana: disable automatic update checking and external snapshots [puppet] - 10https://gerrit.wikimedia.org/r/290868 
[04:50:53] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] grafana: disable automatic update checking and external snapshots [puppet] - 10https://gerrit.wikimedia.org/r/290868 (owner: 10Ori.livneh)
[04:51:16] <grrrit-wm>	 (03PS2) 10Ori.livneh: grafana: disable automatic update checking and external snapshots [puppet] - 10https://gerrit.wikimedia.org/r/290868 
[04:52:21] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] grafana: disable automatic update checking and external snapshots [puppet] - 10https://gerrit.wikimedia.org/r/290868 (owner: 10Ori.livneh)
[04:52:31] <icinga-wm>	 RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS on 10.192.48.44:6479 has 1 databases (db0) with 6471457 keys - replication_delay is 0
[04:56:40] <grrrit-wm>	 (03PS3) 10Ori.livneh: grafana: disable automatic update checking and external snapshots [puppet] - 10https://gerrit.wikimedia.org/r/290868 
[05:02:02] <grrrit-wm>	 (03PS1) 10Ori.livneh: grafana: add wmf branding [puppet] - 10https://gerrit.wikimedia.org/r/290869 
[05:02:26] <grrrit-wm>	 (03CR) 10Ori.livneh: [C: 032] grafana: disable automatic update checking and external snapshots [puppet] - 10https://gerrit.wikimedia.org/r/290868 (owner: 10Ori.livneh)
[05:04:51] <grrrit-wm>	 (03CR) 10Ori.livneh: [C: 032 V: 032] grafana: add wmf branding [puppet] - 10https://gerrit.wikimedia.org/r/290869 (owner: 10Ori.livneh)
[05:32:50] <icinga-wm>	 PROBLEM - puppet last run on planet2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[05:33:01] <icinga-wm>	 PROBLEM - configured eth on planet2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[05:33:20] <icinga-wm>	 PROBLEM - dhclient process on planet2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[05:33:21] <icinga-wm>	 PROBLEM - Disk space on planet2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[05:33:40] <icinga-wm>	 PROBLEM - Check size of conntrack table on planet2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[05:33:40] <icinga-wm>	 PROBLEM - salt-minion processes on planet2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[05:34:00] <icinga-wm>	 PROBLEM - DPKG on planet2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[05:34:10] <mutante>	 nah, it's ok
[05:34:20] <_joe_>	 yes, I was checking too
[05:34:32] <icinga-wm>	 RECOVERY - puppet last run on planet2001 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures
[05:34:42] <_joe_>	 load average a 20, but going down
[05:34:51] <icinga-wm>	 RECOVERY - configured eth on planet2001 is OK: OK - interfaces up
[05:34:57] <_joe_>	 the ganeti bug, for sure
[05:35:01] <icinga-wm>	 RECOVERY - dhclient process on planet2001 is OK: PROCS OK: 0 processes with command name dhclient
[05:35:11] <icinga-wm>	 RECOVERY - Disk space on planet2001 is OK: DISK OK
[05:35:21] <icinga-wm>	 RECOVERY - Check size of conntrack table on planet2001 is OK: OK: nf_conntrack is 0 % full
[05:35:21] <icinga-wm>	 RECOVERY - salt-minion processes on planet2001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[05:35:38] <mutante>	 yea, it doent get traffic and the crons are deactivated. this is just there for failover from planet1001
[05:35:41] <icinga-wm>	 RECOVERY - DPKG on planet2001 is OK: All packages OK
[05:35:49] <mutante>	 and ack @ ganeti
[05:41:58] <mutante>	 _joe_: for anytime later..  not urgent at all.. i wonder if there is a reason _not_ to install the font packaages we install on imagescalers just on all appservers. https://gerrit.wikimedia.org/r/#/c/231284/  and https://phabricator.wikimedia.org/T84777#2328857
[05:42:57] <_joe_>	 mutante: I don't see a reason not to, but had no time to think about it
[05:43:02] <_joe_>	 I have seen your patch though
[05:43:28] <mutante>	 alright! thanks
[05:50:14] <_joe_>	 !log starting upgrades of hhvm to newer libicu in codfw (T86096)
[05:50:15] <stashbot>	 T86096: Switch HAT appservers to trusty's ICU (or newer) - https://phabricator.wikimedia.org/T86096
[05:50:24] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[05:52:52] <grrrit-wm>	 (03CR) 10Dzahn: "this should be fine but in networks.pp $mw_appserver_networks = ['208.80.152.0/22'] does not cover labtestweb2001.wikimedia.org has addres" [puppet] - 10https://gerrit.wikimedia.org/r/290348 (owner: 10Alex Monk)
[05:59:37] <grrrit-wm>	 (03PS1) 10Dzahn: udp2log: move icinga checks from ./files/ to module [puppet] - 10https://gerrit.wikimedia.org/r/290871 
[06:05:25] <grrrit-wm>	 (03PS1) 10Dzahn: udp2log: mv rolematcher.py PacketLossLogtailer.py to module [puppet] - 10https://gerrit.wikimedia.org/r/290872 
[06:06:08] <grrrit-wm>	 (03CR) 10Mobrovac: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/290786 (owner: 10Mobrovac)
[06:11:21] <grrrit-wm>	 (03PS1) 10Dzahn: move/copy ubuntu-cloud.key into openstack/swift modules [puppet] - 10https://gerrit.wikimedia.org/r/290874 
[06:16:29] <grrrit-wm>	 (03PS1) 10Dzahn: varnish: mv wikimedia_vcl, netmapper_upd to separate files [puppet] - 10https://gerrit.wikimedia.org/r/290875 
[06:17:38] <grrrit-wm>	 (03PS2) 10Dzahn: varnish: mv wikimedia_vcl, netmapper_upd to separate files [puppet] - 10https://gerrit.wikimedia.org/r/290875 
[06:20:45] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] udp2log: mv rolematcher.py PacketLossLogtailer.py to module [puppet] - 10https://gerrit.wikimedia.org/r/290872 (owner: 10Dzahn)
[06:20:53] <grrrit-wm>	 (03PS1) 10Dzahn: varnish: move errorpage.html from misc to module [puppet] - 10https://gerrit.wikimedia.org/r/290876 
[06:21:54] <grrrit-wm>	 (03CR) 10Dzahn: "gotta love the PEP8 fail when you are just moving .py files from one place to another :p" [puppet] - 10https://gerrit.wikimedia.org/r/290872 (owner: 10Dzahn)
[06:24:52] <grrrit-wm>	 (03PS1) 10Dzahn: nagios: move check_command/config to own file [puppet] - 10https://gerrit.wikimedia.org/r/290877 
[06:26:47] <grrrit-wm>	 (03CR) 10Ori.livneh: [C: 031] "small cosmetic issue, lgtm otherwise" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/290455 (owner: 10Filippo Giunchedi)
[06:26:58] <grrrit-wm>	 (03PS2) 10Dzahn: RESTBase: Remove purging config [puppet] - 10https://gerrit.wikimedia.org/r/290786 (owner: 10Mobrovac)
[06:30:41] <icinga-wm>	 PROBLEM - puppet last run on wtp2017 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:30:41] <icinga-wm>	 PROBLEM - puppet last run on ms-fe1004 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:31:20] <icinga-wm>	 PROBLEM - puppet last run on pc1006 is CRITICAL: CRITICAL: Puppet has 2 failures
[06:32:30] <icinga-wm>	 PROBLEM - puppet last run on mw1158 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:33:01] <icinga-wm>	 PROBLEM - puppet last run on mw2207 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:34:50] <icinga-wm>	 PROBLEM - puppet last run on cp3007 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:38:26] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] varnish: move errorpage.html from misc to module [puppet] - 10https://gerrit.wikimedia.org/r/290876 (owner: 10Dzahn)
[06:38:42] <grrrit-wm>	 (03CR) 10Ori.livneh: [C: 031] "Looks good; didn't test. Few comments inline" (035 comments) [puppet] - 10https://gerrit.wikimedia.org/r/280652 (https://phabricator.wikimedia.org/T126785) (owner: 10Filippo Giunchedi)
[06:39:01] <icinga-wm>	 RECOVERY - puppet last run on mw1158 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:41:32] <_joe_>	 !log upgrading hhvm on the eqiad canaries, T86096
[06:41:34] <stashbot>	 T86096: Switch HAT appservers to trusty's ICU (or newer) - https://phabricator.wikimedia.org/T86096
[06:41:40] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[06:43:03] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] RESTBase: Remove purging config [puppet] - 10https://gerrit.wikimedia.org/r/290786 (owner: 10Mobrovac)
[06:43:13] <mobrovac>	 heh
[06:43:50] <Nikerabbit>	 does someone know why wikidata went back to wmf.2?
[06:50:32] <ori>	 Nikerabbit: https://dpaste.de/GRsq/raw
[06:50:56] <Nikerabbit>	 ori: kthanks
[06:51:31] <grrrit-wm>	 (03PS3) 10Muehlenhoff: Only require firejail on trusty/jessie [puppet] - 10https://gerrit.wikimedia.org/r/290723 
[06:53:03] <grrrit-wm>	 (03CR) 10Muehlenhoff: [C: 032 V: 032] Only require firejail on trusty/jessie [puppet] - 10https://gerrit.wikimedia.org/r/290723 (owner: 10Muehlenhoff)
[06:55:51] <icinga-wm>	 RECOVERY - puppet last run on gallium is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures
[06:56:01] <icinga-wm>	 RECOVERY - puppet last run on ms-fe1004 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures
[06:56:11] <icinga-wm>	 RECOVERY - puppet last run on wtp2017 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures
[06:57:00] <icinga-wm>	 RECOVERY - puppet last run on mw2207 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures
[06:57:30] <icinga-wm>	 RECOVERY - puppet last run on cp3007 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:57:32] <grrrit-wm>	 (03PS2) 10Ori.livneh: wmflib: allow require_package('g++') [puppet] - 10https://gerrit.wikimedia.org/r/290697 (owner: 10Hashar)
[06:57:47] <grrrit-wm>	 (03CR) 10Ori.livneh: [C: 032 V: 032] "Thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/290697 (owner: 10Hashar)
[06:58:01] <icinga-wm>	 RECOVERY - puppet last run on pc1006 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[07:02:39] <icinga-wm>	 PROBLEM - HHVM rendering on mw2061 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[07:04:07] <icinga-wm>	 RECOVERY - HHVM rendering on mw2061 is OK: HTTP OK: HTTP/1.1 200 OK - 67709 bytes in 0.270 second response time
[07:20:08] <icinga-wm>	 PROBLEM - puppet last run on mw2087 is CRITICAL: CRITICAL: Puppet has 1 failures
[07:29:00] <_joe_>	 !log upgrading hhvm on the eqiad imagescalers, T86096
[07:29:01] <stashbot>	 T86096: Switch HAT appservers to trusty's ICU (or newer) - https://phabricator.wikimedia.org/T86096
[07:29:07] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[07:31:06] <dcausse>	 !log elastic: updating cirrussearch warmers on eqiad and codfw
[07:31:13] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[07:36:48] <_joe_>	 !log upgrading hhvm on eqiad jobrunners, tin + terbium (T86096)
[07:36:49] <stashbot>	 T86096: Switch HAT appservers to trusty's ICU (or newer) - https://phabricator.wikimedia.org/T86096
[07:36:55] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[07:37:17] <grrrit-wm>	 (03PS2) 10Muehlenhoff: Add ferm rules for role::snapshot::dumper [puppet] - 10https://gerrit.wikimedia.org/r/290421 
[07:38:39] <grrrit-wm>	 (03CR) 10Muehlenhoff: [C: 032 V: 032] Add ferm rules for role::snapshot::dumper [puppet] - 10https://gerrit.wikimedia.org/r/290421 (owner: 10Muehlenhoff)
[07:40:21] <grrrit-wm>	 (03PS3) 10Muehlenhoff: Enable base::firewall for new snapshot hosts [puppet] - 10https://gerrit.wikimedia.org/r/290422 
[07:42:43] <grrrit-wm>	 (03CR) 10Muehlenhoff: [C: 032 V: 032] Enable base::firewall for new snapshot hosts [puppet] - 10https://gerrit.wikimedia.org/r/290422 (owner: 10Muehlenhoff)
[07:43:36] <mobrovac>	 !log restbase starting partial mobile-sections dump of enwiki for T135571 on restbase1009
[07:43:37] <stashbot>	 T135571: [BUG] [Content Service] Tapping random causes an unknown error sometimes - https://phabricator.wikimedia.org/T135571
[07:43:44] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[07:45:11] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 032] Change Prop: Purge RESTBase re-renders [puppet] - 10https://gerrit.wikimedia.org/r/290748 (owner: 10Mobrovac)
[07:45:17] <grrrit-wm>	 (03PS3) 10Alexandros Kosiaris: Change Prop: Purge RESTBase re-renders [puppet] - 10https://gerrit.wikimedia.org/r/290748 (owner: 10Mobrovac)
[07:45:22] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [V: 032] Change Prop: Purge RESTBase re-renders [puppet] - 10https://gerrit.wikimedia.org/r/290748 (owner: 10Mobrovac)
[07:46:37] <icinga-wm>	 RECOVERY - puppet last run on mw2087 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures
[07:48:30] <mobrovac>	 akosiaris: ran puppet on scb or should i?
[07:48:46] <mobrovac>	 (cp shouldn't be restarted, i'll do it)
[07:49:34] <akosiaris>	 I ran puppet though
[07:49:37] <icinga-wm>	 PROBLEM - puppet last run on snapshot1005 is CRITICAL: CRITICAL: puppet fail
[07:50:09] <_joe_>	 !log upgrading hhvm on eqiad's api cluster, (T86096)
[07:50:10] <stashbot>	 T86096: Switch HAT appservers to trusty's ICU (or newer) - https://phabricator.wikimedia.org/T86096
[07:50:17] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[07:51:14] <mobrovac>	 kk, restarting
[07:58:27] <icinga-wm>	 PROBLEM - puppet last run on mw1139 is CRITICAL: CRITICAL: Puppet has 1 failures
[07:59:05] <mobrovac>	 akosiaris: hm, wait, something's wrong
[07:59:08] * mobrovac investigating
[08:00:14] <mobrovac>	 akosiaris: nm, false alarm
[08:00:17] <icinga-wm>	 PROBLEM - puppet last run on mw1222 is CRITICAL: CRITICAL: Puppet has 1 failures
[08:01:10] <grrrit-wm>	 (03PS1) 10Muehlenhoff: Drop deployment-ssh rules from role::snapshot::dumper [puppet] - 10https://gerrit.wikimedia.org/r/290882 
[08:02:30] <mobrovac>	 akosiaris: ok, let's wait for 10 mins or so so that things settle down a bit and then continue with the rb part?
[08:02:59] <grrrit-wm>	 (03CR) 10Muehlenhoff: [C: 032 V: 032] Drop deployment-ssh rules from role::snapshot::dumper [puppet] - 10https://gerrit.wikimedia.org/r/290882 (owner: 10Muehlenhoff)
[08:05:27] <icinga-wm>	 RECOVERY - puppet last run on snapshot1005 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures
[08:06:07] <hashar>	 moritzm: thank you for the firejail / gallium fix up yesterday :)
[08:07:47] <moritzm>	 hashar: yw, I wasn't sure whether we run any CI tests on videoscaler-specific tasks? because if we do I'll also need to build firejail for precise since I'm in the process of moving that to use it
[08:13:26] <hashar>	 moritzm: for the MediaWiki PHPUnit tests  we need a wide range of .deb packages 
[08:13:39] <hashar>	 and the easiest way I found to ship those .deb on the CI box has been to include  mediawiki::packages
[08:13:48] <hashar>	 which in turns includes a lot of different classes and packages
[08:13:53] <hashar>	 then
[08:14:09] <hashar>	 gallium is no more running such tests, the related puppet class needs a lot of cleanup  
[08:14:25] <hashar>	 we still have Precise box for the old release, and I am not sure whether firejail would be needed there or not
[08:15:33] <_joe_>	 !log upgrading hhvm on eqiad's appserver cluster, (T86096)
[08:15:34] <stashbot>	 T86096: Switch HAT appservers to trusty's ICU (or newer) - https://phabricator.wikimedia.org/T86096
[08:15:40] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[08:15:52] <moritzm>	 hashar: ok, if it turns out to be needed, I can still build it, just didn't want to waste too much time on a deprecated OS
[08:16:16] <hashar>	 moritzm: is firejail something similar to app armor profile?  Ie you would run :  firejail somenasty.sh ?
[08:16:37] <hashar>	 I would assume some  $wg variable would be set to enable firejail 
[08:17:37] <moritzm>	 hashar: yes, $wgImageMagickConvertCommand will be set to a wrapper which invokes firejail and the actual imagemagick convert
[08:19:13] <hashar>	 I guess that is going to be done in operations/mediawiki-config/ which we do not use for tests
[08:19:37] <hashar>	 so the CI jobs would be stuck to whaqtever is defined in MediaWiki includes/DefaultSettings.php which is: includes/DefaultSettings.php:$wgImageMagickConvertCommand = '/usr/bin/convert';
[08:19:40] <hashar>	 ie no firejail
[08:20:02] <hashar>	 eventually one day we might want to have some integration tests that runs tests relying on imagemagick with a firejail profile
[08:20:15] <hashar>	 maybe that can be added straight into mediawiki/core ;)
[08:20:48] <moritzm>	 hashar: ok, great!
[08:22:49] <grrrit-wm>	 (03PS1) 10DCausse: Elastic: add support for network.host [puppet] - 10https://gerrit.wikimedia.org/r/290883 
[08:23:50] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Elastic: add support for network.host [puppet] - 10https://gerrit.wikimedia.org/r/290883 (owner: 10DCausse)
[08:25:30] <grrrit-wm>	 (03PS2) 10DCausse: Elastic: add support for network.host [puppet] - 10https://gerrit.wikimedia.org/r/290883 
[08:25:58] <icinga-wm>	 RECOVERY - puppet last run on mw1222 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[08:26:16] <icinga-wm>	 RECOVERY - puppet last run on mw1139 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[08:29:28] <grrrit-wm>	 (03CR) 10Gehel: [C: 032] Elastic: add support for network.host [puppet] - 10https://gerrit.wikimedia.org/r/290883 (owner: 10DCausse)
[08:30:47] <icinga-wm>	 PROBLEM - puppet last run on mw1110 is CRITICAL: CRITICAL: Puppet has 1 failures
[08:35:59] <grrrit-wm>	 (03PS1) 10Alexandros Kosiaris: wikistats: Fix pplint error in wikistats::db [puppet] - 10https://gerrit.wikimedia.org/r/290885 
[08:36:37] <moritzm>	 !log installing openssh security updates on trusty systems
[08:36:44] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[08:38:10] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 032] wikistats: Fix pplint error in wikistats::db [puppet] - 10https://gerrit.wikimedia.org/r/290885 (owner: 10Alexandros Kosiaris)
[08:39:20] <grrrit-wm>	 (03PS3) 10Alexandros Kosiaris: RESTBase: Remove purging config [puppet] - 10https://gerrit.wikimedia.org/r/290786 (owner: 10Mobrovac)
[08:40:23] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 032] RESTBase: Remove purging config [puppet] - 10https://gerrit.wikimedia.org/r/290786 (owner: 10Mobrovac)
[08:43:47] <grrrit-wm>	 (03PS1) 10DCausse: Cirrus: disable the safeifier in labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/290886 
[08:45:27] <moritzm>	 !log powercycling snapshot1004 (stuck after reboot)
[08:45:34] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[08:46:51] <grrrit-wm>	 (03CR) 10Gehel: [C: 032] Cirrus: disable the safeifier in labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/290886 (owner: 10DCausse)
[08:51:02] <wikibugs>	 06Operations, 10ops-eqiad, 06DC-Ops: I/O issues for /dev/sdd on analytics1047.eqiad.wmnet - https://phabricator.wikimedia.org/T134056#2329295 (10elukey) I was able to partition the new disk with ext4, but it has appeared under /dev/sda rather than /dev/sdd.   Quick recap about the analytics config from https...
[08:53:04] <logmsgbot>	 !log dcausse@tin Synchronized wmf-config/CirrusSearch-labs.php: Cirrus: disable the safeifier in labs (duration: 02m 36s)
[08:53:12] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[08:54:34] <icinga-wm>	 PROBLEM - puppet last run on mw2130 is CRITICAL: CRITICAL: Puppet has 1 failures
[08:55:34] <mobrovac>	 !log restbase deployment start of bd38b1b
[08:55:41] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[08:56:30] <_joe_>	 oh you're deploying restbase
[08:56:40] <mobrovac>	 yes
[08:56:41] <_joe_>	 I was tailing pybal logs and saw rb hosts failing 
[08:56:42] <_joe_>	 :P
[08:56:43] <icinga-wm>	 RECOVERY - puppet last run on mw1110 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures
[08:56:44] <icinga-wm>	 PROBLEM - puppet last run on mw2107 is CRITICAL: CRITICAL: Puppet has 1 failures
[08:56:48] <mobrovac>	 hehe
[08:58:06] <grrrit-wm>	 (03PS1) 10Hashar: rsync: allow extra settings in rsyncd.conf [puppet] - 10https://gerrit.wikimedia.org/r/290895 (https://phabricator.wikimedia.org/T136276) 
[08:58:08] <logmsgbot>	 !log dcausse@tin Synchronized wmf-config/CirrusSearch-labs.php: Cirrus: disable the safeifier in labs (duration: 00m 25s)
[08:58:08] <grrrit-wm>	 (03PS1) 10Hashar: contint: disable DNS lookup for castor rsyncd [puppet] - 10https://gerrit.wikimedia.org/r/290896 (https://phabricator.wikimedia.org/T136276) 
[08:58:15] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[08:58:24] <grrrit-wm>	 (03CR) 10Hashar: [C: 04-1] "untested" [puppet] - 10https://gerrit.wikimedia.org/r/290895 (https://phabricator.wikimedia.org/T136276) (owner: 10Hashar)
[08:58:30] <grrrit-wm>	 (03CR) 10Hashar: [C: 04-1] "untested" [puppet] - 10https://gerrit.wikimedia.org/r/290896 (https://phabricator.wikimedia.org/T136276) (owner: 10Hashar)
[08:59:15] <grrrit-wm>	 (03PS5) 10Filippo Giunchedi: graphite: split uwsgi logs to separate files [puppet] - 10https://gerrit.wikimedia.org/r/290455 
[09:00:36] <grrrit-wm>	 (03PS6) 10Filippo Giunchedi: graphite: split uwsgi logs to separate files [puppet] - 10https://gerrit.wikimedia.org/r/290455 
[09:00:43] <grrrit-wm>	 (03CR) 10Filippo Giunchedi: [C: 032 V: 032] graphite: split uwsgi logs to separate files [puppet] - 10https://gerrit.wikimedia.org/r/290455 (owner: 10Filippo Giunchedi)
[09:05:40] <mobrovac>	 !log restbase deployment end of bd38b1b
[09:05:47] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[09:07:20] <jynus>	 !log converting user table on labswiki to utf8
[09:07:27] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[09:07:48] <grrrit-wm>	 (03PS1) 10Filippo Giunchedi: uwsgi: use @basename not @title in syslog [puppet] - 10https://gerrit.wikimedia.org/r/290899 
[09:09:50] <grrrit-wm>	 (03CR) 10Filippo Giunchedi: [C: 032 V: 032] uwsgi: use @basename not @title in syslog [puppet] - 10https://gerrit.wikimedia.org/r/290899 (owner: 10Filippo Giunchedi)
[09:12:32] <wikibugs>	 06Operations, 10ops-eqiad, 06DC-Ops: I/O issues for /dev/sdd on analytics1047.eqiad.wmnet - https://phabricator.wikimedia.org/T134056#2329330 (10elukey) Yes single disk raid0 virtual drive seems to be the way:  ``` elukey@analytics1047:~$ sudo megacli -LDInfo -L2 -a0   Adapter 0 -- Virtual Drive Information:...
[09:13:28] <grrrit-wm>	 (03PS5) 10Filippo Giunchedi: service::node: Prepare for scap3 config deploys [puppet] - 10https://gerrit.wikimedia.org/r/290490 (owner: 10Mobrovac)
[09:13:36] <grrrit-wm>	 (03CR) 10Filippo Giunchedi: [C: 032 V: 032] service::node: Prepare for scap3 config deploys [puppet] - 10https://gerrit.wikimedia.org/r/290490 (owner: 10Mobrovac)
[09:15:15] <godog>	 mobrovac: ^
[09:15:23] <jynus>	 tgr, can you check horizon log?
[09:15:30] <jynus>	 *logging
[09:15:32] <mobrovac>	 thnx godog!
[09:15:42] <godog>	 np
[09:15:47] <mobrovac>	 godog: i'll run puppet on scb
[09:17:00] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] contint: disable DNS lookup for castor rsyncd [puppet] - 10https://gerrit.wikimedia.org/r/290896 (https://phabricator.wikimedia.org/T136276) (owner: 10Hashar)
[09:17:14] <tgr>	 jynus: I still get "An error occurred authenticating. Please try again later.
[09:18:15] <jynus>	 can I copy that^ to the ticket?
[09:18:34] <icinga-wm>	 PROBLEM - puppet last run on ms-fe1001 is CRITICAL: CRITICAL: Puppet has 1 failures
[09:19:35] <icinga-wm>	 PROBLEM - DPKG on mw2152 is CRITICAL: DPKG CRITICAL dpkg reports broken packages
[09:20:34] <icinga-wm>	 RECOVERY - puppet last run on ms-fe1001 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures
[09:21:13] <icinga-wm>	 RECOVERY - puppet last run on mw2130 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[09:21:24] <icinga-wm>	 RECOVERY - puppet last run on mw2107 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures
[09:21:34] <icinga-wm>	 RECOVERY - DPKG on mw2152 is OK: All packages OK
[09:25:23] <icinga-wm>	 PROBLEM - puppet last run on mw2152 is CRITICAL: CRITICAL: Puppet has 4 failures
[09:28:11] <_joe_>	 !log all traffic serving appservers are now running with libicu52 (T86096)
[09:28:12] <stashbot>	 T86096: Switch HAT appservers to trusty's ICU (or newer) - https://phabricator.wikimedia.org/T86096
[09:28:18] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[09:33:00] <wikibugs>	 06Operations, 06Performance-Team, 10Thumbor: Package and backport Thumbor dependencies in Debian - https://phabricator.wikimedia.org/T134485#2329384 (10fgiunchedi) pillow 3.2.0-2~bpo8+1 uploaded to jessie-backports, should appear in the next few days
[09:34:47] <grrrit-wm>	 (03CR) 10Filippo Giunchedi: [C: 04-1] "superceded by I7be2777 I think" [puppet] - 10https://gerrit.wikimedia.org/r/290797 (https://phabricator.wikimedia.org/T136206) (owner: 10Dzahn)
[09:39:53] <grrrit-wm>	 (03PS3) 10Filippo Giunchedi: assign 'c' IPs for restbase100[7-9] [puppet] - 10https://gerrit.wikimedia.org/r/290797 (https://phabricator.wikimedia.org/T136206) (owner: 10Dzahn)
[09:42:10] <grrrit-wm>	 (03CR) 10Filippo Giunchedi: [C: 032 V: 032] "nevermind, looks like this came first, amended to comment the hosts, now supercedes I7be2777f" [puppet] - 10https://gerrit.wikimedia.org/r/290797 (https://phabricator.wikimedia.org/T136206) (owner: 10Dzahn)
[09:42:28] <grrrit-wm>	 (03PS4) 10Filippo Giunchedi: assign 'c' IPs for restbase100[7-9] [puppet] - 10https://gerrit.wikimedia.org/r/290797 (https://phabricator.wikimedia.org/T136206) (owner: 10Dzahn)
[09:42:33] <grrrit-wm>	 (03CR) 10Filippo Giunchedi: [V: 032] assign 'c' IPs for restbase100[7-9] [puppet] - 10https://gerrit.wikimedia.org/r/290797 (https://phabricator.wikimedia.org/T136206) (owner: 10Dzahn)
[09:43:26] <grrrit-wm>	 (03Abandoned) 10Filippo Giunchedi: stub out missing 'c' instances [puppet] - 10https://gerrit.wikimedia.org/r/290800 (https://phabricator.wikimedia.org/T136206) (owner: 10Eevans)
[09:45:12] <icinga-wm>	 PROBLEM - check_mysql on fdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 2420
[09:49:53] <icinga-wm>	 RECOVERY - puppet last run on mw2152 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures
[09:50:12] <icinga-wm>	 RECOVERY - check_mysql on fdb2001 is OK: Uptime: 1890889 Threads: 2 Questions: 34677885 Slow queries: 11238 Opens: 1151 Flush tables: 2 Open tables: 577 Queries per second avg: 18.339 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 300
[09:54:59] <grrrit-wm>	 (03CR) 10Filippo Giunchedi: [C: 031] "LGTM, other cassandra clusters might be interested in the change too?" [puppet] - 10https://gerrit.wikimedia.org/r/290860 (owner: 10Eevans)
[09:58:02] <grrrit-wm>	 (03Abandoned) 10Jcrespo: Remove dns entries for es2005-es2010 [dns] - 10https://gerrit.wikimedia.org/r/287645 (https://phabricator.wikimedia.org/T134755) (owner: 10Jcrespo)
[09:58:43] <_joe_>	 !log starting updateCollations.php forced run on all wikis with uca category collation
[09:58:50] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[10:01:15] <wikibugs>	 06Operations, 10ops-codfw: ms-be2012.codfw.wmnet: slot=10 dev=sdk failed - https://phabricator.wikimedia.org/T135975#2329435 (10MoritzMuehlenhoff) a:03Papaul
[10:09:14] <icinga-wm>	 PROBLEM - Host pay-lvs2002 is DOWN: PING CRITICAL - Packet loss = 100%
[10:09:19] <icinga-wm>	 PROBLEM - Host payments2002 is DOWN: PING CRITICAL - Packet loss = 100%
[10:09:52] <_joe_>	 srx again?
[10:09:59] <akosiaris>	 looking
[10:10:44] <jynus>	 I checked no user impact
[10:10:46] <akosiaris>	 it's responding to pings but I 've not a shell yet
[10:11:26] <icinga-wm>	 PROBLEM - Host alnitak is DOWN: PING CRITICAL - Packet loss = 100%
[10:11:31] <icinga-wm>	 PROBLEM - Host betelgeuse is DOWN: PING CRITICAL - Packet loss = 100%
[10:11:42] <akosiaris>	 yeah, it's almost definitely the pfw
[10:12:16] <paravoid>	 goddammit
[10:12:20] <mark>	 right when faidon was ranting to me about juniper
[10:13:09] <paravoid>	 -rw-rw----  1 root  wheel          0 May 26 10:06 /var/tmp/flowd_octeon_hm.core.0.gz
[10:13:14] <paravoid>	 0-byte coredump again
[10:13:19] <akosiaris>	 yay
[10:13:36] <grrrit-wm>	 (03PS1) 10Muehlenhoff: Provide a wrapper to invoke convert using firejail [puppet] - 10https://gerrit.wikimedia.org/r/290909 
[10:14:52] <paravoid>	 plenty of available space too
[10:14:58] <paravoid>	 I ran a storage cleanup last time around
[10:15:18] <icinga-wm>	 RECOVERY - Host pay-lvs2002 is UP: PING OK - Packet loss = 0%, RTA = 34.07 ms
[10:15:23] <icinga-wm>	 RECOVERY - Host alnitak is UP: PING OK - Packet loss = 0%, RTA = 34.11 ms
[10:15:28] <icinga-wm>	 RECOVERY - Host betelgeuse is UP: PING OK - Packet loss = 0%, RTA = 33.41 ms
[10:15:35] <icinga-wm>	 RECOVERY - Host payments2002 is UP: PING OK - Packet loss = 0%, RTA = 33.68 ms
[10:16:13] <grrrit-wm>	 (03Abandoned) 10Muehlenhoff: WIP: Use firejail in image scaling [mediawiki-config] - 10https://gerrit.wikimedia.org/r/288390 (https://phabricator.wikimedia.org/T135111) (owner: 10Muehlenhoff)
[10:16:37] <paravoid>	 I can open a case to ask juniper what the fuck is with 0-byte coredumps
[10:16:46] <akosiaris>	 JSRPD_HA_CONTROL_LINK_DOWN: HA control link monitor status is marked down
[10:17:33] <akosiaris>	 10:05:49 ^
[10:29:46] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Provide a wrapper to invoke convert using firejail [puppet] - 10https://gerrit.wikimedia.org/r/290909 (owner: 10Muehlenhoff)
[10:37:55] <icinga-wm>	 RECOVERY - cassandra-c CQL 10.64.32.204:9042 on restbase1012 is OK: TCP OK - 0.001 second response time on port 9042
[10:38:30] <wikibugs>	 06Operations, 07Puppet, 06Labs: Implement role based hiera lookups for labs - https://phabricator.wikimedia.org/T120165#2329486 (10hashar)
[10:38:49] <wikibugs>	 06Operations, 07Puppet, 10Beta-Cluster-Infrastructure: Hiera hierarchy hieradata/role/* is not applied on labs (eg  deployment-prep) - https://phabricator.wikimedia.org/T136080#2322414 (10hashar)
[10:38:50] <wikibugs>	 06Operations, 07Puppet, 06Labs: Implement role based hiera lookups for labs - https://phabricator.wikimedia.org/T120165#1847021 (10hashar)
[10:39:12] <wikibugs>	 06Operations, 07Puppet, 10Beta-Cluster-Infrastructure: Hiera hierarchy hieradata/role/* is not applied on labs (eg  deployment-prep) - https://phabricator.wikimedia.org/T136080#2322414 (10hashar) Thanks @scfc marked this as a duplicate of T120165. I have copy pasted my extended task description there.
[10:39:44] <wikibugs>	 06Operations, 07Puppet, 10Beta-Cluster-Infrastructure, 06Labs: Implement role based hiera lookups for labs - https://phabricator.wikimedia.org/T120165#1847021 (10hashar)
[10:44:24] <grrrit-wm>	 (03PS3) 10Muehlenhoff: Provide a firejail profile for the image scalers [puppet] - 10https://gerrit.wikimedia.org/r/290696 (https://phabricator.wikimedia.org/T135111) 
[10:44:51] <grrrit-wm>	 (03PS7) 10Filippo Giunchedi: prometheus: add server support [puppet] - 10https://gerrit.wikimedia.org/r/280652 (https://phabricator.wikimedia.org/T126785) 
[10:44:52] <wikibugs>	 06Operations, 07Puppet, 10Beta-Cluster-Infrastructure, 06Labs: Implement role based hiera lookups for labs - https://phabricator.wikimedia.org/T120165#2329534 (10Joe) The role keyword is used in production to refer to large groups of hosts; we DEFINITELY don't want to have role lookups in labs for the same...
[10:44:58] <grrrit-wm>	 (03CR) 10Filippo Giunchedi: prometheus: add server support (035 comments) [puppet] - 10https://gerrit.wikimedia.org/r/280652 (https://phabricator.wikimedia.org/T126785) (owner: 10Filippo Giunchedi)
[10:45:13] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] prometheus: add server support [puppet] - 10https://gerrit.wikimedia.org/r/280652 (https://phabricator.wikimedia.org/T126785) (owner: 10Filippo Giunchedi)
[10:46:02] <grrrit-wm>	 (03PS2) 10Muehlenhoff: Provide a wrapper to invoke convert using firejail [puppet] - 10https://gerrit.wikimedia.org/r/290909 
[10:47:58] <grrrit-wm>	 (03PS8) 10Filippo Giunchedi: prometheus: add server support [puppet] - 10https://gerrit.wikimedia.org/r/280652 (https://phabricator.wikimedia.org/T126785) 
[10:49:56] <grrrit-wm>	 (03CR) 10Filippo Giunchedi: [C: 031] Monitoring: Install vendor specific RAID tool [puppet] - 10https://gerrit.wikimedia.org/r/290717 (https://phabricator.wikimedia.org/T97998) (owner: 10Volans)
[10:52:45] <icinga-wm>	 PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL: replication_delay is 669 600 - REDIS on 10.192.48.44:6479 has 1 databases (db0) with 6494536 keys - replication_delay is 669
[10:53:29] <wikibugs>	 06Operations, 10RESTBase-Cassandra, 13Patch-For-Review: better cassandra process checks - https://phabricator.wikimedia.org/T108306#2329544 (10fgiunchedi)
[10:53:31] <wikibugs>	 06Operations, 10RESTBase, 10hardware-requests: Expand RESTBase cluster capacity - https://phabricator.wikimedia.org/T93790#2329546 (10fgiunchedi)
[10:53:35] <wikibugs>	 07Blocked-on-Operations, 06Operations, 10RESTBase, 10RESTBase-Cassandra, and 2 others: Finish conversion to multiple Cassandra instances per hardware node - https://phabricator.wikimedia.org/T95253#2329542 (10fgiunchedi) 05Open>03Resolved I agree this is complete, let's followup on T134016, resolving!
[10:56:46] <icinga-wm>	 RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS on 10.192.48.44:6479 has 1 databases (db0) with 6482710 keys - replication_delay is 0
[11:00:05] <wikibugs>	 06Operations, 07HHVM, 13Patch-For-Review, 07User-notice: Switch HAT appservers to trusty's ICU (or newer) - https://phabricator.wikimedia.org/T86096#2329554 (10Joe) Upgrade is done and scripts are running. Sadly, while some are exceeding my conservative evaluation of performance, frwiki is running around 1...
[11:02:21] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Provide a wrapper to invoke convert using firejail [puppet] - 10https://gerrit.wikimedia.org/r/290909 (owner: 10Muehlenhoff)
[11:09:24] <wikibugs>	 06Operations, 10ops-eqiad, 06DC-Ops: I/O issues for /dev/sdd on analytics1047.eqiad.wmnet - https://phabricator.wikimedia.org/T134056#2329564 (10elukey) Fixed the issue with:  ``` sudo megacli -PDMakeGood -PhysDrv '[32:2]' -Force -a0 sudo megacli -CfgLdAdd -r0 [32:2] -a0 ```  After the reboot the /dev/sdd di...
[11:15:31] <wikibugs>	 06Operations, 06Performance-Team, 10Thumbor: Package and backport Thumbor dependencies in Debian - https://phabricator.wikimedia.org/T134485#2329584 (10Gilles)
[11:15:49] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 031] "+1" [puppet] - 10https://gerrit.wikimedia.org/r/289236 (https://phabricator.wikimedia.org/T132747) (owner: 1020after4)
[11:20:14] <icinga-wm>	 PROBLEM - check_mysql on fdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 2067
[11:21:59] <wikibugs>	 06Operations, 10cassandra: Grafana bugginess; Graph scales sometimes off by an order of magnitude - https://phabricator.wikimedia.org/T121789#2329590 (10fgiunchedi) I think it comes from statsd recommendation on how to aggregate graphite metrics (https://github.com/etsy/statsd/blob/master/docs/graphite.md). Re...
[11:24:36] <wikibugs>	 06Operations, 06Performance-Team, 10Thumbor: Package and backport Thumbor dependencies in Debian - https://phabricator.wikimedia.org/T134485#2329593 (10Gilles) The only remaining dependency, the upstream update of scikit-image is proving difficult. The package is massive, its packaging is complicated. It has...
[11:25:14] <icinga-wm>	 RECOVERY - check_mysql on fdb2001 is OK: Uptime: 1896589 Threads: 1 Questions: 34732919 Slow queries: 11404 Opens: 1153 Flush tables: 2 Open tables: 577 Queries per second avg: 18.313 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0
[11:27:04] <wikibugs>	 06Operations, 10cassandra: change graphite aggregation function for cassandra 'count' metrics - https://phabricator.wikimedia.org/T121789#2329595 (10fgiunchedi)
[11:53:44] <icinga-wm>	 PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL: replication_delay is 714 600 - REDIS on 10.192.48.44:6479 has 1 databases (db0) with 6488123 keys - replication_delay is 714
[11:55:41] <moritzm>	 !log rebooting mx2001 for kernel update to Linux 4.4
[11:55:46] <icinga-wm>	 RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS on 10.192.48.44:6479 has 1 databases (db0) with 6481588 keys - replication_delay is 0
[11:55:48] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[12:07:10] <moritzm>	 !log rolling reboot of restbase-test cluster
[12:07:16] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[12:13:18] <wikibugs>	 06Operations, 06Performance-Team, 10Thumbor: Package and backport Thumbor dependencies in Debian - https://phabricator.wikimedia.org/T134485#2329671 (10Gilles) @fgiunchedi pointed out that pyssim has no license: https://github.com/jterrace/pyssim/issues/14  I've tracked down the original author of the code t...
[12:24:44] <grrrit-wm>	 (03PS1) 10Mobrovac: RESTBase: use the appropriate logger name [puppet] - 10https://gerrit.wikimedia.org/r/290922 (https://phabricator.wikimedia.org/T103124) 
[12:25:26] <icinga-wm>	 PROBLEM - Host es2017 is DOWN: PING CRITICAL - Packet loss = 100%
[12:26:10] <grrrit-wm>	 (03CR) 10Ppchelko: [C: 031] RESTBase: use the appropriate logger name [puppet] - 10https://gerrit.wikimedia.org/r/290922 (https://phabricator.wikimedia.org/T103124) (owner: 10Mobrovac)
[12:26:56] <grrrit-wm>	 (03CR) 10Mobrovac: "PCC confirms that's it - https://puppet-compiler.wmflabs.org/2931/ :)" [puppet] - 10https://gerrit.wikimedia.org/r/290922 (https://phabricator.wikimedia.org/T103124) (owner: 10Mobrovac)
[12:29:06] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 032] RESTBase: use the appropriate logger name [puppet] - 10https://gerrit.wikimedia.org/r/290922 (https://phabricator.wikimedia.org/T103124) (owner: 10Mobrovac)
[12:31:33] <jynus>	 !log updating user table on labswiki to fix incorrect encoding T131630
[12:31:34] <stashbot>	 T131630: Tgr unable to login on Horizon - https://phabricator.wikimedia.org/T131630
[12:31:40] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[12:33:24] <jynus>	 did es2017 crashed / lost network?
[12:36:27] <jynus>	 it seems on serial console like a kernel crash
[12:39:47] <jynus>	 !log powercycling es2017
[12:39:54] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[12:42:12] <icinga-wm>	 RECOVERY - Host es2017 is UP: PING OK - Packet loss = 0%, RTA = 34.47 ms
[12:47:36] <wikibugs>	 06Operations, 10ops-eqiad, 06Analytics-Kanban, 06DC-Ops: I/O issues for /dev/sdd on analytics1047.eqiad.wmnet - https://phabricator.wikimedia.org/T134056#2329742 (10elukey)
[12:48:36] <wikibugs>	 06Operations, 10ops-eqiad, 06Analytics-Kanban, 06DC-Ops: I/O issues for /dev/sdd on analytics1047.eqiad.wmnet - https://phabricator.wikimedia.org/T134056#2253604 (10elukey) a:05Cmjohnson>03elukey
[13:05:10] <wikibugs>	 06Operations, 10RESTBase-Cassandra, 06Services, 10cassandra: Cleanup Graphite Cassandra metrics - https://phabricator.wikimedia.org/T132771#2329809 (10fgiunchedi) 05Open>03Resolved resolving as cassandra metrics are cleaned up now
[13:10:43] <wikibugs>	 06Operations, 10DBA: High replication lag to dewiki - https://phabricator.wikimedia.org/T135100#2329826 (10jcrespo)
[13:12:31] <wikibugs>	 06Operations, 10Traffic, 13Patch-For-Review, 07Performance: Lots of Title::purgeExpiredRestriction from API DELETE FROM `page_restrictions` WHERE (pr_expiry < '20160517063108') without batching/throttling potentially causing lag on s5-api - https://phabricator.wikimedia.org/T135470#2329838 (10jcrespo) I wo...
[13:12:49] <wikibugs>	 06Operations, 10DBA: High replication lag to dewiki - https://phabricator.wikimedia.org/T135100#2329840 (10jcrespo) 05Open>03Resolved
[13:26:41] <wikibugs>	 06Operations, 10Monitoring, 10netops, 03Scap3 (Scap3-Adoption-Phase1): Deploy libreNMS with scap3 - https://phabricator.wikimedia.org/T129136#2329902 (10faidon) a:03akosiaris
[13:30:54] <wikibugs>	 06Operations, 10ops-eqiad, 06Analytics-Kanban, 06DC-Ops: I/O issues for /dev/sdd on analytics1047.eqiad.wmnet - https://phabricator.wikimedia.org/T134056#2329937 (10elukey) Added some documentation in:  https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hadoop/Administration#Swapping_broken_disk
[13:35:13] <wikibugs>	 06Operations, 10ops-codfw, 10DBA: es2019 crashed with no logs - https://phabricator.wikimedia.org/T130702#2329955 (10jcrespo) p:05Normal>03High es2017 just (crashed?) at 12:25 today, I do not think that is unrelated.
[13:35:24] <wikibugs>	 06Operations, 10ops-codfw, 10DBA: es2019 crashed with no logs - https://phabricator.wikimedia.org/T130702#2329958 (10jcrespo) 05stalled>03Open
[13:37:20] <Krenair>	 bd808, hey you know mwscriptwikiset?
[13:44:25] <grrrit-wm>	 (03PS1) 10Elukey: Allow float result for int/int division in gmond's memcached module. [puppet] - 10https://gerrit.wikimedia.org/r/290933 
[13:50:02] <wikibugs>	 06Operations, 10ops-codfw, 10DBA: es2019 crashed with no logs - https://phabricator.wikimedia.org/T130702#2330063 (10jcrespo) Nothing on syslog:  ``` May 26 12:05:01 es2017 CRON[135137]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1) May 26 12:15:01 es2017 CRON[135912]: (root) CMD (command...
[13:52:22] <jynus>	 !log restarting es2017 for kernel upgrade
[13:52:29] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[13:55:05] <grrrit-wm>	 (03PS1) 10Ottomata: Update otto's iterm2 shell integration script [puppet] - 10https://gerrit.wikimedia.org/r/290934 
[13:56:40] <grrrit-wm>	 (03CR) 10Ottomata: [C: 032] Update otto's iterm2 shell integration script [puppet] - 10https://gerrit.wikimedia.org/r/290934 (owner: 10Ottomata)
[14:04:24] <wikibugs>	 06Operations, 10ops-codfw, 10DBA: es2017 and es2019 crashed with no logs - https://phabricator.wikimedia.org/T130702#2330100 (10jcrespo)
[14:05:16] <wikibugs>	 06Operations: Apt mirror for Ubuntu Trusty hash sum mismatch - https://phabricator.wikimedia.org/T136307#2330101 (10hashar)
[14:05:31] <grrrit-wm>	 (03PS1) 10Alexandros Kosiaris: rsyslog::receiver: Increase log retention to 90 days [puppet] - 10https://gerrit.wikimedia.org/r/290935 
[14:09:32] <wikibugs>	 06Operations: Apt mirror for Ubuntu Trusty hash sum mismatch - https://phabricator.wikimedia.org/T136307#2330125 (10hashar) For what it is worth the mirror status page from 19 hours ago shows that Trusty is lagging behind https://launchpad.net/ubuntu/+mirror/wikimedia-archive   {F4057150 size=full}
[14:12:36] <milimetric>	 for fun and just in case someone here deals with typo-squatting, check out the random attacks at store.wikipeda.org
[14:13:11] <milimetric>	 (asks for location, to install add-ons, etc.  it's like a kitchen sink of simple hacks)
[14:14:30] <Krenair>	 domains are handled by legal
[14:14:35] <wikibugs>	 06Operations, 10media-storage, 13Patch-For-Review: swift capacity planning - https://phabricator.wikimedia.org/T1268#2330165 (10fgiunchedi) another factor for capacity swift capacity planning purposes is space allocated for different container types, most importantly thumbs and originals (69T vs 89T)  [[ htt...
[14:16:21] <grrrit-wm>	 (03CR) 10Filippo Giunchedi: [C: 031] rsyslog::receiver: Increase log retention to 90 days [puppet] - 10https://gerrit.wikimedia.org/r/290935 (owner: 10Alexandros Kosiaris)
[14:21:38] <wikibugs>	 06Operations, 06Discovery, 10Maps: Configure monitoring / alerting of Postgresql / redis / ... cluster for maps - https://phabricator.wikimedia.org/T135647#2330195 (10Gehel)
[14:21:40] <grrrit-wm>	 (03PS1) 10Hashar: contint: let us vary localhost vhost unix user [puppet] - 10https://gerrit.wikimedia.org/r/290938 (https://phabricator.wikimedia.org/T136301) 
[14:22:09] <wikibugs>	 06Operations, 06Discovery, 10Maps: Configure monitoring / alerting of Postgresql / redis / ... cluster for maps - https://phabricator.wikimedia.org/T135647#2305596 (10Gehel) Karthoterian check could be an HTTP check on https://maps.wikimedia.org/osm-intl/0/0/0.png (or the equivalent on localhost)
[14:23:06] <grrrit-wm>	 (03CR) 10Alex Monk: "possibly, but not for scap itself AFAIK:" [puppet] - 10https://gerrit.wikimedia.org/r/290348 (owner: 10Alex Monk)
[14:24:54] <grrrit-wm>	 (03PS1) 10Ottomata: Add druid100[123] with just standard and base::firewall [puppet] - 10https://gerrit.wikimedia.org/r/290940 (https://phabricator.wikimedia.org/T134275) 
[14:26:27] <grrrit-wm>	 (03CR) 10Ottomata: [C: 032] Add druid100[123] with just standard and base::firewall [puppet] - 10https://gerrit.wikimedia.org/r/290940 (https://phabricator.wikimedia.org/T134275) (owner: 10Ottomata)
[14:27:30] <wikibugs>	 06Operations, 10ops-codfw, 10DBA: es2017 and es2019 crashed with no logs - https://phabricator.wikimedia.org/T130702#2330213 (10jcrespo) I think I got it:  ``` "Normal","Mon Feb 08 2016 16:06:18","Log cleared." "Critical","Thu May 26 2016 12:22:06","CPU 2 has an internal error (IERR)." "Normal","Thu May 26 2...
[14:28:31] <ottomata>	 robh, yt?
[14:30:47] <bd808>	 Krenair: I haven't looked at or used mwscriptwikiset, no
[14:30:58] <Krenair>	 it's like foreachwikiindblist
[14:31:00] <Krenair>	 but different
[14:31:37] <Krenair>	 they do very similar things
[14:32:05] <grrrit-wm>	 (03CR) 10Rush: contint: let us vary localhost vhost unix user (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/290938 (https://phabricator.wikimedia.org/T136301) (owner: 10Hashar)
[14:32:10] <grrrit-wm>	 (03PS2) 10Rush: contint: let us vary localhost vhost unix user [puppet] - 10https://gerrit.wikimedia.org/r/290938 (https://phabricator.wikimedia.org/T136301) (owner: 10Hashar)
[14:32:18] <bd808>	 Looking now. Very very similar
[14:32:34] <hashar>	 chasemp: neat :)
[14:32:44] <bd808>	 Krenair: should we figure out how to combine them?
[14:33:00] <bd808>	 The difference seems to be the output prefixing mostly
[14:34:13] <wikibugs>	 06Operations, 10ops-eqiad, 13Patch-For-Review: rack/setup/deploy 3 eqiad druid nodes - https://phabricator.wikimedia.org/T134275#2330218 (10Ottomata) > what does 'update install_server module' mean? Oh duh, it means stick in some MAC addies and some partman.  Ok, I'm working on this.  @Cmjohnson druid1003 do...
[14:34:40] <Krenair>	 I think I wrote a task for it months ago
[14:34:54] <Krenair>	 https://phabricator.wikimedia.org/T109798
[14:37:12] <wikibugs>	 06Operations, 06Labs, 06Project-Admins: Archive old Incident-* projects - https://phabricator.wikimedia.org/T134624#2330224 (10Danny_B)
[14:37:32] <wikibugs>	 06Operations, 10ops-eqiad: Wipe wmf4727 - https://phabricator.wikimedia.org/T136309#2330226 (10akosiaris)
[14:37:43] <wikibugs>	 06Operations, 10ops-eqiad: Wipe wmf4727 - https://phabricator.wikimedia.org/T136309#2330238 (10akosiaris) p:05Triage>03High
[14:39:28] <wikibugs>	 06Operations, 10ops-eqiad: Wipe wmf4727 - https://phabricator.wikimedia.org/T136309#2330242 (10faidon)
[14:39:30] <wikibugs>	 06Operations, 10ops-eqiad, 06DC-Ops: testing: r430 server / h800 controller / md1200 shelf - https://phabricator.wikimedia.org/T127490#2330244 (10faidon)
[14:40:40] <wikibugs>	 06Operations, 10ops-eqiad, 06DC-Ops: testing: r430 server / h800 controller / md1200 shelf - https://phabricator.wikimedia.org/T127490#2045136 (10faidon) Folks, having a wmfNNNN server set up like that for such a long time and not being cleaned up properly is a problem for security and general maintenance re...
[14:40:42] <ilevy>	 I'm looking at a alerting script, check_graphite, from operations-puppet.git.   It looks like there was a temporary version created locally on a server that's still in use, and the issue prematurely marked as resolved
[14:41:10] <ilevy>	 issue link https://phabricator.wikimedia.org/T116035 temp file mentioned here https://gerrit.wikimedia.org/r/#/c/255415/
[14:41:11] <wikibugs>	 06Operations, 10ops-codfw, 10DBA: es2017 and es2019 crashed with no logs - https://phabricator.wikimedia.org/T130702#2330264 (10jcrespo) a:05jcrespo>03Papaul For es2017, CPU seems to have failed:  ``` CPU 2 Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz E5 2600 MHz IERR 10 ```  Memory show currently as ok, bu...
[14:41:39] <wikibugs>	 06Operations, 06Labs, 06Project-Admins: Archive old Incident-* projects - https://phabricator.wikimedia.org/T134624#2330268 (10Danny_B)
[14:44:06] <chasemp>	 ilevy: you probably want to ping ottomata considering that changeset^
[14:44:08] <grrrit-wm>	 (03CR) 10Hashar: contint: let us vary localhost vhost unix user (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/290938 (https://phabricator.wikimedia.org/T136301) (owner: 10Hashar)
[14:44:21] <grrrit-wm>	 (03PS3) 10Hashar: contint: let us vary localhost vhost unix user [puppet] - 10https://gerrit.wikimedia.org/r/290938 (https://phabricator.wikimedia.org/T136301) 
[14:44:45] <wikibugs>	 06Operations, 06Labs, 06Project-Admins: Archive old Incident-* projects - https://phabricator.wikimedia.org/T134624#2330287 (10Danny_B)
[14:47:31] <grrrit-wm>	 (03CR) 10Rush: [C: 032] contint: let us vary localhost vhost unix user [puppet] - 10https://gerrit.wikimedia.org/r/290938 (https://phabricator.wikimedia.org/T136301) (owner: 10Hashar)
[14:47:34] <ottomata>	 oof not remembering, but I see at least in cache/kafka/webrequest.pp , the conditional no longer is present
[14:47:41] <ottomata>	 (even though the comment is)
[14:47:58] <ilevy>	 I reopened the issue
[14:48:15] <grrrit-wm>	 (03PS1) 10Ottomata: Add netboot MACs and partman recipe for druid hosts [puppet] - 10https://gerrit.wikimedia.org/r/290944 (https://phabricator.wikimedia.org/T134275) 
[14:48:20] <ilevy>	 modules/nagios_common/files/check_commands/check_graphite.cfg still refers to the local script
[14:48:24] <ilevy>	 so I think it might still be used
[14:48:33] <ilevy>	 since --until isn't in the checked in version but it's used
[14:48:53] <ilevy>	 I noticed because my company is also using this script I wanted --until support and noticed you guys added it in git and then reverted
[14:49:04] <ottomata>	 hm thanks ilevy yeah looks like that one fell through the cracks
[14:49:31] <wikibugs>	 06Operations: encrypt syslog traffic - https://phabricator.wikimedia.org/T136312#2330299 (10fgiunchedi)
[14:51:02] <grrrit-wm>	 (03PS1) 10Eevans: enable instance restbase1014-c.eqiad.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/290945 (https://phabricator.wikimedia.org/T134016) 
[14:52:04] <grrrit-wm>	 (03PS2) 10Ottomata: Add netboot MACs and partman recipe for druid hosts [puppet] - 10https://gerrit.wikimedia.org/r/290944 (https://phabricator.wikimedia.org/T134275) 
[14:52:36] <urandom>	 Could I get someone to merge https://gerrit.wikimedia.org/r/#/c/290945/?  It adds a new Cassandra instance for bootstrap.
[14:52:47] <wikibugs>	 06Operations, 10vm-requests: eqiad/codfw: 1 VM request for prometheus - https://phabricator.wikimedia.org/T136313#2330317 (10fgiunchedi)
[14:53:26] <ottomata>	 urandom:  that host exists?
[14:53:35] <ottomata>	  / this is safe for me to merge?
[14:53:38] <urandom>	 yup!
[14:54:07] <urandom>	 ottomata: those entries were all laid out ahead of time
[14:54:26] <grrrit-wm>	 (03CR) 10Ottomata: [C: 032] enable instance restbase1014-c.eqiad.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/290945 (https://phabricator.wikimedia.org/T134016) (owner: 10Eevans)
[14:54:28] <urandom>	 uncommenting them just creates the config for that instance, so we can start the bootstrap
[14:54:49] <ottomata>	 done.
[14:54:53] <urandom>	 ottomata: thank you!
[14:54:58] <grrrit-wm>	 (03PS3) 10Ottomata: Add netboot MACs and partman recipe for druid hosts [puppet] - 10https://gerrit.wikimedia.org/r/290944 (https://phabricator.wikimedia.org/T134275) 
[14:55:06] <grrrit-wm>	 (03CR) 10Ottomata: [C: 032 V: 032] Add netboot MACs and partman recipe for druid hosts [puppet] - 10https://gerrit.wikimedia.org/r/290944 (https://phabricator.wikimedia.org/T134275) (owner: 10Ottomata)
[14:57:19] <wikibugs>	 06Operations, 10ops-codfw, 10DBA: es2017 and es2019 crashed with no logs - https://phabricator.wikimedia.org/T130702#2330334 (10jcrespo) es2019 seems to had suffered the same cpu and memory errors:   ``` MEM0001: Multi-bit memory errors detected on a memory device at location(s) DIMM_A1.  2016-04-22T14:48:59...
[14:58:48] <wikibugs>	 06Operations, 06Analytics-Kanban, 10Traffic: Verify why varnishkafka stats and webrequest logs count differs - https://phabricator.wikimedia.org/T136314#2330347 (10elukey) p:05Triage>03High
[14:59:04] <wikibugs>	 06Operations, 06Labs, 10Labs-Infrastructure: rcstream not working for wikitech wiki - https://phabricator.wikimedia.org/T136245#2330352 (10Krenair) I think we might need to change `@resolve(wikitech.wikimedia.org)` to `@resolve(wikitech.wikimedia.org, AAAA)`
[15:00:04] <jouncebot>	 anomie ostriches thcipriani marktraceur aude: Respected human, time to deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160526T1500). Please do the needful.
[15:00:04] <jouncebot>	 Pchelolo: A patch you scheduled for Morning SWAT (Max 8 patches) is about to be deployed. Please be available during the process.
[15:02:08] <wikibugs>	 06Operations: Race condition in setting net.netfilter.nf_conntrack_tcp_timeout_time_wait - https://phabricator.wikimedia.org/T136094#2330361 (10MoritzMuehlenhoff) So the problem occurs whenever /etc/sysctl.d/70-ferm_conntrack.conf is processed before ferm has been started (which loads the nf_conntrack kernel mod...
[15:02:09] <thcipriani>	 I can SWAT today. Pchelolo ping me when you're around.
[15:02:19] <Pchelolo>	 thcipriani: I'm here
[15:02:30] <thcipriani>	 okie doke
[15:06:21] <urandom>	 !log Bootstrapping restbase1014-c.eqiad.wmnet : T134016
[15:06:22] <stashbot>	 T134016: RESTBase Cassandra cluster: Increase instance count to 3 - https://phabricator.wikimedia.org/T134016
[15:06:29] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[15:06:32] <icinga-wm>	 PROBLEM - cassandra-c CQL 10.64.48.137:9042 on restbase1014 is CRITICAL: Connection refused
[15:06:43] <urandom>	 expected; got this ^^
[15:07:33] <icinga-wm>	 ACKNOWLEDGEMENT - cassandra-c CQL 10.64.48.137:9042 on restbase1014 is CRITICAL: Connection refused eevans Node is bootstrapping. - The acknowledgement expires at: 2016-05-27 15:07:13.
[15:07:35] <logmsgbot>	 !log thcipriani@tin Synchronized php-1.28.0-wmf.3/extensions/EventBus: SWAT: [[gerrit:290906|Use getPrefixedURL and getPrefixedDBkey instead of getText]] (duration: 00m 35s)
[15:07:42] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[15:07:43] <thcipriani>	 ^ Pchelolo check please
[15:08:00] <robh>	 ottomata: here now, sup?
[15:08:25] <Pchelolo>	 thcipriani: one moment
[15:12:12] <urandom>	 !log Starting cleanup of restbase1012-a.eqiad.wmnet
[15:12:20] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[15:12:40] <ottomata>	 robh:  hiyaa, was going to ask about some install server stuff, i got through it though!
[15:12:50] <ottomata>	 except, druid1003's mgmt doesn't seem to be responding
[15:13:00] <ottomata>	 druid1001 and 1002 are, and i think they are installing os now :o
[15:13:07] <ottomata>	 hm, or maybe not
[15:13:21] <ottomata>	 ahhh actually they are not, they just keep saying
[15:13:23] <ottomata>	 May 26 15:13:19 carbon dhcpd: DHCPDISCOVER from 14:02:ec:06:8b:ec via 10.64.36.3: network 10.64.36.0/24: no free leases
[15:14:26] <Pchelolo>	 thcipriani: all's good, thank you
[15:14:39] <thcipriani>	 Pchelolo: cool. Thanks for checking :)
[15:16:41] <ottomata>	 robh:  hmm maybe I configured somehtign wrong
[15:16:42] <ottomata>	 i see
[15:16:49] <robh>	 no free leases means either the dns isnt right
[15:16:51] <ottomata>	 DHCPDISCOVER from 1c:98:ec:29:e2:98 via 10.64.5.3: network 10.64.5.0/24: no free leases for druid1001
[15:16:51] <robh>	 or the vlan isnt right
[15:16:54] <ottomata>	 and
[15:16:56] <ottomata>	 in dns
[15:16:58] <robh>	 should these be eqiad or wikimedia.org?
[15:17:03] <ottomata>	 it is 10.64.0.163
[15:17:03] <ottomata>	 eqiad
[15:17:10] <ottomata>	 wmnet
[15:17:12] <robh>	 ok, thats the right network, has carbon gotten the update?
[15:17:16] <ottomata>	 yes
[15:17:50] <robh>	 puppets disabled on carbon
[15:17:56] <robh>	 but checking to see if it has the update
[15:18:09] <ottomata>	 hm, i ran pupppet before I tried booting and saw my commit applied
[15:18:13] <Krinkle>	 thcipriani: I'd like to push https://gerrit.wikimedia.org/r/#/c/290710/ out in the SWAT
[15:18:15] <ottomata>	 but, robh, that is the right network?
[15:18:24] <robh>	 checking stuff now
[15:18:35] <robh>	 i have a checklist, i dont skip around it ;]
[15:18:41] <thcipriani>	 bd808: are you fine with https://gerrit.wikimedia.org/r/#/c/290867/1 going out with SWAT?
[15:18:43] <ottomata>	 10.64.0.163 is not in 10.64.5.0/24, is it?
[15:18:48] <thcipriani>	 Krinkle: okie doke
[15:18:49] <ottomata>	 haha ok robh
[15:19:08] <robh>	 which host is this specifically we are looking at?
[15:19:12] <bd808>	 thcipriani: yeah if it looks good to you do it :)
[15:19:13] <robh>	 you mentioned like 3 and then some output 
[15:19:14] <robh>	 ;]
[15:19:27] <robh>	 druid1001 ?
[15:19:27] <ottomata>	 robh, both druid1001 and druid1002
[15:19:33] <robh>	 ok, lets stick with druid1001 for now
[15:19:34] <ottomata>	 k
[15:19:39] <ottomata>	 May 26 15:19:33 carbon dhcpd: DHCPDISCOVER from 1c:98:ec:2a:a1:50 via 10.64.32.3: network 10.64.32.0/22: no free leases
[15:19:42] <ottomata>	 is druid1001
[15:19:44] <ottomata>	 looking
[15:19:57] <ottomata>	 but in dns it has 10.64.0.163
[15:20:19] <robh>	 1C:98:EC:29:E2:98
[15:20:24] <robh>	 so those macs dont match
[15:20:35] <robh>	 druid1001 has a mac address in the lease file of 1C:98:EC:29:E2:98
[15:20:41] <ottomata>	 wait
[15:20:43] <kart_>	 !log Update cxserver to b431aef
[15:20:43] <robh>	 and you just pasteed a wholly different mac address
[15:20:50] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[15:21:00] <ottomata>	 i think i pasted you a wrong log message...
[15:21:19] <ottomata>	 this is what i put in the linux-host-entries file
[15:21:19] <ottomata>	 1C:98:EC:29:E2:98
[15:21:23] <ottomata>	 right
[15:21:30] <ottomata>	 sorry robh
[15:21:34] <ottomata>	 wrong log entry
[15:21:34] <ottomata>	 this one
[15:21:35] <ottomata>	 May 26 15:19:46 carbon dhcpd: DHCPDISCOVER from 1c:98:ec:29:e2:98 via 10.64.5.2: network 10.64.5.0/24: no free leases
[15:21:37] <ottomata>	 is druid1001
[15:22:30] <robh>	 ok, so they match on mac
[15:22:32] <robh>	 checking dns
[15:24:05] <grrrit-wm>	 (03PS2) 10Thcipriani: CommonSettings: cleanup temp cache file if rename fails [mediawiki-config] - 10https://gerrit.wikimedia.org/r/290867 (https://phabricator.wikimedia.org/T136258) (owner: 10BryanDavis)
[15:24:41] <grrrit-wm>	 (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/290867 (https://phabricator.wikimedia.org/T136258) (owner: 10BryanDavis)
[15:25:02] <robh>	 well, the dns shows its in private1-a-eqiad
[15:25:07] <robh>	 but your switch config has it in analytics1-a-eqiad
[15:25:11] <robh>	 which would explain this
[15:25:14] <robh>	 ottomata: ^
[15:25:17] <grrrit-wm>	 (03Merged) 10jenkins-bot: CommonSettings: cleanup temp cache file if rename fails [mediawiki-config] - 10https://gerrit.wikimedia.org/r/290867 (https://phabricator.wikimedia.org/T136258) (owner: 10BryanDavis)
[15:25:32] <robh>	 so you are trying to hand out a IP lease over a subnet that isnt allowed to do so for another subnet
[15:25:48] <thcipriani>	 Krinkle: ^ going to push that while waiting for jenkins, FYI
[15:25:50] <ottomata>	 robh hm, ok, i didn't do the dns
[15:25:56] <ottomata>	 this should be in analytics vlan
[15:26:01] <Krinkle>	 thcipriani: Ok
[15:26:03] <ottomata>	 so, do I just need to pick a diff dns?
[15:26:08] <robh>	 ottomata: yep, well, that explains the answer.  you need to redo your dns to move it into the right part of the file
[15:26:12] <robh>	 and your production dns entries will change
[15:26:13] <ottomata>	 ok
[15:26:17] <ottomata>	 cool, doing...
[15:26:27] <robh>	 anytime its no free leases its usually a dns/vlan thing
[15:26:36] <robh>	 just hard to diagnose without logging into switch stack =]
[15:26:57] <robh>	 so yeah, accidental dns mismatch on setup (if i did dns, sorry about that ;)
[15:27:08] <robh>	 i dont recall if i did, so many systems setup recently, heh.
[15:27:38] <logmsgbot>	 !log thcipriani@tin Synchronized wmf-config/CommonSettings.php: SWAT: [[gerrit:290867|CommonSettings: cleanup temp cache file if rename fails]] (duration: 00m 30s)
[15:27:44] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[15:30:23] <ottomata>	 haha
[15:30:25] <ottomata>	 ja dunno either
[15:30:35] <grrrit-wm>	 (03PS1) 10Ottomata: Move druid entries into analytics vlans [dns] - 10https://gerrit.wikimedia.org/r/290954 (https://phabricator.wikimedia.org/T134275) 
[15:30:36] <ottomata>	 robh, look better? ^
[15:31:10] <logmsgbot>	 !log thcipriani@tin Synchronized php-1.28.0-wmf.3/resources/src/mediawiki.special/mediawiki.special.search.css: SWAT: [[gerrit:290710|Fix regression: text color in .mw-search-result-data (duration: 00m 27s)
[15:31:17] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[15:31:33] <thcipriani>	 ^ Krinkle sync'd
[15:31:52] <Krinkle>	 thcipriani: Thanks. Confirmed fix
[15:32:59] <grrrit-wm>	 (03CR) 10Ottomata: [C: 032] Move druid entries into analytics vlans [dns] - 10https://gerrit.wikimedia.org/r/290954 (https://phabricator.wikimedia.org/T134275) (owner: 10Ottomata)
[15:34:39] <ottomata>	 robh, one more thing
[15:34:46] <ottomata>	 i can't access druid1003.mgmt.eqiad.wmnet
[15:34:48] <ottomata>	 so I can't find its mac
[15:37:12] <icinga-wm>	 PROBLEM - Disk space on ms-be2012 is CRITICAL: DISK CRITICAL - /srv/swift-storage/sdm3 is not accessible: Input/output error
[15:37:55] <ottomata>	 hmm, robh.  maybe i'm not patient enough.  druid101 from carbon now resolves to 10.64.5.101
[15:37:59] <ottomata>	 that's what I want
[15:38:11] <ottomata>	 still getting DHCPDISCOVER from 1c:98:ec:29:e2:98 via 10.64.5.2: network 10.64.5.0/24: no free leases
[15:39:36] <ottomata>	 cmjohnson1: ahhh you are here! :)
[15:41:10] <robh>	 ottomata: huh?
[15:41:19] <robh>	 the dns is wrong in git
[15:41:50] <robh>	 ottomata: so not sure how you mean it now resolves to another ip?
[15:42:04] <robh>	 lets focus on just one machine at a time please
[15:42:12] <robh>	 (im itentionally ignoring the issue on 2003)
[15:42:14] <robh>	 sorry 1003
[15:42:30] <robh>	 ottomata: So are you saying now druid1001 gets a lease from carbon?
[15:42:35] <robh>	 (that shouldnt be possible)
[15:43:22] <ottomata>	 ok
[15:43:24] <ottomata>	 no
[15:43:25] <ottomata>	 robh
[15:43:25] <robh>	 same with 1002
[15:43:26] <ottomata>	 what?
[15:43:30] <ottomata>	 the dns is wrong in git?
[15:43:33] <ottomata>	 i just changed it
[15:43:36] <robh>	 oh, ok
[15:43:44] <cmjohnson1>	 @ottomata what's up?
[15:43:45] <ottomata>	 https://gerrit.wikimedia.org/r/#/c/290954/
[15:43:46] <robh>	 lemme repull
[15:44:08] <robh>	 ottomata: So I'm not sure what exactly step you are on.  is druid1001 now getting a lease
[15:44:09] <robh>	 ?
[15:44:13] <ottomata>	 no
[15:44:14] <ottomata>	 its not
[15:44:21] <ottomata>	 i changed the dns so that it is now int he analytics vlan
[15:44:24] <robh>	 well, puppet is disabled on carbon
[15:44:24] <ottomata>	 and updated it on ns0
[15:44:32] <ottomata>	 does puppet need to run after a dns change?
[15:44:34] <robh>	 hrmm, shoudlnt matter actually
[15:44:36] <ottomata>	 ja
[15:44:37] <ottomata>	 so
[15:44:39] <ottomata>	 from carbon
[15:44:43] <robh>	 except the old ip is likely cached
[15:44:48] <ottomata>	 dig druid1001.eqiad.wment shows my change
[15:45:03] <robh>	 you should hop on to any of the pdns recursors in eqiad and rec_control wipe-cache druid1001.eqiad.wmnet
[15:45:12] <robh>	 it likely has the old entries cached, so carbon doesnt know to get the new one
[15:45:16] <ottomata>	 ns0,1,2?
[15:45:32] <robh>	 negative
[15:45:35] <robh>	 looking them up now
[15:45:55] <jynus>	 !log applying schema change to s3 hosts echo wikis T135699
[15:45:56] <stashbot>	 T135699: Schema changes for Echo moderation - https://phabricator.wikimedia.org/T135699
[15:46:02] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[15:46:09] <robh>	 chromium|hydrogen
[15:46:21] <robh>	 ottomata: so hop one iether one of those machines (grepped out of site.pp) chromium|hydrogen
[15:46:30] <robh>	 and run rec_control wipe-cache druid1001.eqiad.wmnet
[15:46:33] <robh>	 it likely has stuff to clear out
[15:46:52] <robh>	 Alternatively, if you walked away from it in frustration, the dns cache would expire eventually ;D
[15:46:54] <ottomata>	 ok
[15:46:56] <ottomata>	 ha yeah
[15:47:07] <ottomata>	 ok, done, let's see what happens...
[15:47:18] <robh>	 then reboot it into pxe and (non)profit?
[15:47:35] <ottomata>	 robh, it seems to be stuck in reboot from network cycle
[15:47:38] <ottomata>	 so it keeps trying
[15:50:15] <robh>	 not sure what you mean
[15:50:30] <robh>	 dont we wnat it network booting right now?
[15:53:05] <ottomata>	 yes
[15:53:06] <ottomata>	 we do
[15:53:11] <ottomata>	 i mean i don't have to go in and make it do it
[15:53:31] <ottomata>	 hm, robh ya still the same
[15:53:35] <ottomata>	 DHCPDISCOVER from 1c:98:ec:29:e2:98 via 10.64.5.2: network 10.64.5.0/24: no free leases
[15:54:09] <ottomata>	 cmjohnson1:  robh is helping me with druid dns/dhcp issues
[15:54:11] <ottomata>	 but, also
[15:54:15] <ottomata>	 druid1003's mgmt doesn't respond
[15:55:46] <robh>	 im looking into 1001 settings now
[15:55:52] <robh>	 ottomata: i'll be rebooting it likely
[15:55:54] <ottomata>	 k
[15:55:55] <ottomata>	 np
[15:55:55] <wikibugs>	 06Operations, 10Traffic, 07HTTPS, 05MW-1.27-release-notes, 13Patch-For-Review: Insecure POST traffic - https://phabricator.wikimedia.org/T105794#1451756 (10Danmichaelo) >>! In T105794#2314347, @bd808 wrote: > @Steinsplitter reported to me on irc that >> for protocol relative urls in mwclient, scheme='htt...
[15:55:56] <robh>	 and will hop on its console
[15:56:00] <ottomata>	 i will get out of console
[15:56:07] <ottomata>	 haha, actually
[15:56:09] <ottomata>	 not sure how...
[15:56:11] <ottomata>	 on these
[15:56:19] <ottomata>	 OH!
[15:56:19] <ottomata>	 i got it
[15:56:23] <ottomata>	 not sure what i did
[15:56:24] <ottomata>	 i think esc )
[15:56:38] <ottomata>	 k i'm out
[15:57:27] <robh>	 robh@iron:~$ host druid1001.eqiad.wmnet
[15:57:27] <robh>	 druid1001.eqiad.wmnet has address 10.64.0.163
[15:57:37] <robh>	 so some dns still has the other entry
[15:57:43] <robh>	 you ran the wipe on chromium right?
[15:57:46] <robh>	 lemme try hydrogen
[15:58:13] <wikibugs>	 06Operations, 10cassandra, 13Patch-For-Review: Assign 'c' instance IPs for restbase100[7-9].eqiad.wmnet - https://phabricator.wikimedia.org/T136206#2330620 (10Eevans) a:03Dzahn
[15:58:26] <grrrit-wm>	 (03CR) 10Luke081515: [C: 031] Changetags should be granted only to sysops and bots in ruwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/290680 (https://phabricator.wikimedia.org/T136187) (owner: 10Urbanecm)
[15:58:28] <robh>	 robh@hydrogen:~$ sudo rec_control wipe-cache druid1001.eqiad.wmnet
[15:58:28] <robh>	 wiped 1 records, 1 negative records
[15:58:51] <robh>	 ottomata:  you did that via sudo right?
[15:59:06] <robh>	 i had negative records on both of the eqiad recurosors (hydrogen and chromium) but now wiped
[15:59:07] <wikibugs>	 06Operations, 10cassandra, 13Patch-For-Review: Assign 'c' instance IPs for restbase100[7-9].eqiad.wmnet - https://phabricator.wikimedia.org/T136206#2326678 (10Eevans) 05Open>03Resolved With https://gerrit.wikimedia.org/r/290797 merged, this is now complete I think; Thanks @Dzahn !
[15:59:12] <robh>	 rebooting and seeing if it works now
[16:00:04] <jouncebot>	 godog moritzm: Respected human, time to deploy Puppet SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160526T1600). Please do the needful.
[16:00:04] <jouncebot>	 Dereckson: A patch you scheduled for Puppet SWAT(Max 8 patches) is about to be deployed. Please be available during the process.
[16:00:24] <robh>	 ok, its rebooting now
[16:00:26] <robh>	 we shall see
[16:00:43] <robh>	 ottomata: that was likely by bad in advising, i assumed you could clear on one of the recursors and it would take effect on the other
[16:00:45] <grrrit-wm>	 (03CR) 10Luke081515: [C: 031] Enable DynamicPageList extension on te.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/285009 (https://phabricator.wikimedia.org/T104163) (owner: 10Urbanecm)
[16:00:51] <robh>	 but perhaps it doesnt, as longas you ran it as sudo
[16:00:57] <robh>	 it should have worked.
[16:01:24] <robh>	 ottomata: if you ahve 1002 booting, kill it so it doesnt clutter our logs
[16:01:30] <robh>	 i see a bunch of stuff hitting
[16:02:14] <ottomata>	 robh, ja 
[16:02:23] <ottomata>	 [@chromium:~] $ sudo  rec_control wipe-cache druid1001.eqiad.wmnet
[16:02:23] <ottomata>	 wiped 2 records, 0 negative records
[16:02:30] <robh>	 it works
[16:02:36] <ottomata>	 ah ok
[16:02:37] <robh>	 druid1001 is booting now into installer
[16:02:40] <Dereckson>	 (patch isn't mergeable right now, we lost Wikimedia CH server for that)
[16:02:43] <ottomata>	 nice!
[16:02:47] <robh>	 so yeah, turns out you have to killthe negative cache on both recursors in a given site
[16:02:49] <ottomata>	 robh doing clear on both for 1002
[16:02:56] <robh>	 ottomata: sorry about that!
[16:03:10] <robh>	 so yeah, just fyi, if you were installing in codfw, its two different servers ;]
[16:03:17] <robh>	 but just grepping site.pp for recursor will show you them
[16:03:19] <grrrit-wm>	 (03CR) 10Filippo Giunchedi: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/289236 (https://phabricator.wikimedia.org/T132747) (owner: 1020after4)
[16:03:32] <godog>	 thcipriani twentyafterfour ^ good to merge cc akosiaris 
[16:03:33] <ottomata>	 we'll see if partman works
[16:03:41] <robh>	 i disconnected from 1001
[16:03:46] <ottomata>	 ok
[16:03:46] <robh>	 its all yours (i left in the isntaller run)
[16:04:30] <robh>	 ottomata: learn something new daily right?  So that solves the no free leases issue. =]
[16:04:59] <ottomata>	 ha, ja!  thank you
[16:05:03] <ottomata>	 am watching installers now
[16:05:09] <ottomata>	 robh, any idea about 1003's mgmt?
[16:05:40] <thcipriani>	 godog: \o/ should be good from my perspective as long as all the keys are correct in secrets. I can test (mwdeploy at least) if it's merged
[16:07:26] <godog>	 thcipriani: ack, I'll merge
[16:07:48] <cmjohnson1_>	 ottomata: druid1003 mgmt is working
[16:07:58] <grrrit-wm>	 (03PS30) 10Filippo Giunchedi: keyholder key cleanup [puppet] - 10https://gerrit.wikimedia.org/r/289236 (https://phabricator.wikimedia.org/T132747) (owner: 1020after4)
[16:08:07] <grrrit-wm>	 (03CR) 10Filippo Giunchedi: [C: 032 V: 032] keyholder key cleanup [puppet] - 10https://gerrit.wikimedia.org/r/289236 (https://phabricator.wikimedia.org/T132747) (owner: 1020after4)
[16:08:37] <grrrit-wm>	 (03CR) 10Dereckson: "Wikimedia CH server is now up again." [puppet] - 10https://gerrit.wikimedia.org/r/286147 (https://phabricator.wikimedia.org/T56780) (owner: 10Dereckson)
[16:08:38] <Dereckson>	 godog: moritzm: oh http://wikimediapakistan.org/ is up again, so we can merge it now ^
[16:09:49] <ottomata>	 thanks cmjohnson1_ i'm in
[16:11:15] <godog>	 thcipriani: I'm rearming keyholder
[16:11:24] <icinga-wm>	 PROBLEM - Host wmf4727-test is DOWN: PING CRITICAL - Packet loss = 100%
[16:11:26] <thcipriani>	 ack
[16:13:22] <godog>	 thcipriani: good to go!
[16:13:29] * thcipriani tests
[16:14:20] <grrrit-wm>	 (03PS1) 10Ottomata: Add druid1003's MAC to linxu-host-entries [puppet] - 10https://gerrit.wikimedia.org/r/290962 (https://phabricator.wikimedia.org/T134275) 
[16:14:36] <thcipriani>	 godog: could you do a service restart of keyholder-proxy ?
[16:14:46] <thcipriani>	 it shouldn't require you to reload keys
[16:14:50] <thcipriani>	 it just reloads permissions
[16:14:55] <icinga-wm>	 PROBLEM - Keyholder SSH agent on tin is CRITICAL: CRITICAL: Keyholder is not armed. Run keyholder arm to arm it.
[16:15:31] <thcipriani>	 ^ blerg. I think I know what that's about.
[16:15:48] <godog>	 thcipriani: sure, restarted the proxy just now
[16:15:58] <godog>	 could it be lagging behind? it is armed now
[16:16:00] <thcipriani>	 godog: perfect. Working now
[16:16:15] <icinga-wm>	 PROBLEM - puppet last run on aqs1003 is CRITICAL: CRITICAL: puppet fail
[16:16:21] <thcipriani>	 no, I think it's because we're now storing public keys in /etc/keyholder.d along with private keys.
[16:16:33] <ottomata>	 cmjohnson1_:  i can't reset boot order to disk
[16:16:35] <ottomata>	 it keeps reinstalling
[16:16:41] <ottomata>	 set /system1/bootconfig1/bootsource5 bootorder=5
[16:16:43] <wikibugs>	 06Operations, 10Traffic, 07HTTPS, 05MW-1.27-release-notes, 13Patch-For-Review: Insecure POST traffic - https://phabricator.wikimedia.org/T105794#2330749 (10Steinsplitter) >>! In T105794#2330598, @Danmichaelo wrote: >>>! In T105794#2314347, @bd808 wrote: >> @Steinsplitter reported to me on irc that >>> fo...
[16:16:44] <ottomata>	 error_tag=INVALID TARGET
[16:16:51] <thcipriani>	 and the check just makes sure that all the files in /etc/keyholder.d are in the agent
[16:16:58] <cmjohnson1_>	 ottomata...for 1003?
[16:17:02] <cmjohnson1_>	 druid1003?
[16:17:02] <ottomata>	 no, 1001
[16:17:05] <ottomata>	 probalby for all
[16:17:14] <godog>	 thcipriani: hah, makes sense, thanks
[16:17:34] <ottomata>	  /system1/bootconfig1
[16:17:34] <ottomata>	   Targets
[16:17:34] <ottomata>	     bootsource1
[16:17:34] <ottomata>	     bootsource2
[16:17:34] <ottomata>	     bootsource3
[16:17:35] <ottomata>	     bootsource4
[16:17:37] <ottomata>	   Properties
[16:17:37] <ottomata>	     oemhp_bootmode=Legacy
[16:17:37] <ottomata>	     oemhp_secureboot=Not Available
[16:17:37] <ottomata>	     oemhp_pendingbootmode=Legacy
[16:17:38] <ottomata>	 no bootsource5
[16:17:45] <cmjohnson1_>	 ottomata: it's a setting in bios
[16:18:11] <thcipriani>	 godog: patch coming shortly for that
[16:18:16] <cmjohnson1_>	 the HP comes default setting to use there UEFI and raid controller...in bios you have to change it.  I don't think I got to it for them
[16:18:42] <cmjohnson1_>	 fixing now
[16:19:20] <ottomata>	 cmjohnson1_:  i think it has bios
[16:19:22] <ottomata>	 legacy bios
[16:19:26] <ottomata>	 i got there and checked
[16:19:28] <ottomata>	 it did netboot
[16:19:56] <ottomata>	 i'm actually confused what is the current state on 1001 now
[16:20:09] <ottomata>	 hmm, i take it back
[16:20:13] <ottomata>	 it did boot from hdd
[16:20:21] <cmjohnson1_>	 it's not jsut legacy
[16:20:29] <ottomata>	 [    0.112862] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR 38d is 330)
[16:20:29] <ottomata>	 ?
[16:20:30] <cmjohnson1_>	 there is another SATA setting that needs to be fixed
[16:20:40] <ottomata>	 mdadm: No devices listed in conf file were found.
[16:20:40] <ottomata>	 Gave up waiting for root device.  Common problems:
[16:20:40] <ottomata>	  - Boot args (cat /proc/cmdline)
[16:20:40] <ottomata>	    - Check rootdelay= (did the system wait long enough?)
[16:20:40] <ottomata>	    - Check root= (did the system wait for the right device?)
[16:20:40] <ottomata>	  - Missing modules (cat /proc/modules; ls /dev)
[16:20:40] <ottomata>	 ALERT!  /dev/disk/by-uuid/076f6ec1-8f05-447b-9f13-accfca1a5ec1 does not exist.  Dropping to a shell!
[16:20:41] <ottomata>	 hm
[16:20:41] <ottomata>	 ok
[16:20:41] <wikibugs>	 06Operations, 10ops-codfw, 10ops-eqiad, 10vm-requests: eqiad/codfw: 1 VM request for prometheus - https://phabricator.wikimedia.org/T136313#2330768 (10Danny_B)
[16:20:50] <ottomata>	 cmjohnson1_:  i will wait for you to check
[16:20:50] <ottomata>	 ?
[16:21:15] <icinga-wm>	 PROBLEM - puppet last run on sca2002 is CRITICAL: CRITICAL: puppet fail
[16:21:58] <ottomata>	 can't seem to get out of initramfs
[16:22:05] <icinga-wm>	 PROBLEM - puppet last run on mw2113 is CRITICAL: CRITICAL: Puppet has 1 failures
[16:22:30] <grrrit-wm>	 (03PS1) 10Thcipriani: Do not include public keys in keyholder check [puppet] - 10https://gerrit.wikimedia.org/r/290966 
[16:22:47] <thcipriani>	 ^ godog should fix keyholder
[16:23:01] <thcipriani>	 er, keyholder checks rather
[16:23:06] <godog>	 thcipriani: nice, taking a look now
[16:23:25] <cmjohnson1_>	 ottomata: can you log out of the vsp for 1001/1002 plz
[16:23:32] <ottomata>	 trying...
[16:23:34] <ottomata>	 not really sure how
[16:23:35] <icinga-wm>	 PROBLEM - Keyholder SSH agent on mira is CRITICAL: CRITICAL: Keyholder is not armed. Run keyholder arm to arm it.
[16:23:46] <cmjohnson1_>	 esc (
[16:23:54] <ottomata>	 ah
[16:23:55] <ottomata>	 htank you
[16:23:57] <ottomata>	 out
[16:24:05] <ottomata>	 of 1001
[16:24:16] <ottomata>	 and 1002 out too
[16:24:22] <cmjohnson1_>	 thx
[16:24:35] <thcipriani>	 godog: also sca2002/aqs1003 puppet run fails are probably this change, but I'm unsure what would be changing there.
[16:24:39] <cmjohnson1_>	 not sure what the issue is 1003 has the right setup..i know robh ran into an issue like this with another HP
[16:24:42] <grrrit-wm>	 (03CR) 10Ottomata: [C: 032] Add druid1003's MAC to linxu-host-entries [puppet] - 10https://gerrit.wikimedia.org/r/290962 (https://phabricator.wikimedia.org/T134275) (owner: 10Ottomata)
[16:24:43] <cmjohnson1_>	 not sure what fixed it now
[16:24:54] <godog>	 thcipriani: yeah I was looking at that too, Error: Could not retrieve catalog from remote server: Error 400 on SERVER: secret(): invalid secret keyholder/deploy-service.pub at /etc/puppet/modules/scap/manifests/target.pp:83 on node sca2002.codfw.wmnet
[16:25:02] <ottomata>	 ha, ok
[16:25:34] <cmjohnson1_>	 sorry fixed it then
[16:26:04] <icinga-wm>	 RECOVERY - puppet last run on mw2113 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[16:26:05] <thcipriani>	 godog: hmm, either the file isn't in the secret module or it's unreadable by puppetmaster looks like
[16:26:13] <ottomata>	 oh cmjohnson1_ 1003 seems fine now
[16:26:16] <ottomata>	 i was about to get in
[16:26:18] <ottomata>	 to mgmt
[16:26:34] <thcipriani>	 https://github.com/wikimedia/operations-puppet/blob/production/modules/wmflib/lib/puppet/parser/functions/secret.rb#L23
[16:26:36] <ottomata>	 am ready to netboot install it too
[16:26:44] <ottomata>	 so, if you could check on 1001,1002,1003 to make sure bios settings are correct
[16:26:48] <ottomata>	 then i will try again
[16:26:55] <icinga-wm>	 PROBLEM - puppet last run on aqs1001 is CRITICAL: CRITICAL: puppet fail
[16:26:58] <cmjohnson1_>	 1003 is rebooting now if you wanna login
[16:26:58] <ottomata>	 (or if you can get them installed, that is good too!)
[16:27:03] <ottomata>	 to vps?
[16:27:05] <ottomata>	 uh
[16:27:08] <cmjohnson1_>	 yes
[16:27:13] <ottomata>	 vsp i mean
[16:27:20] <ottomata>	 ok
[16:27:30] <cmjohnson1_>	 I do not see anything wrong with setup
[16:27:50] <ottomata>	 oh ok
[16:27:52] <godog>	 thcipriani: yeah the name in secret is deploy_service not deploy-service
[16:28:02] <ottomata>	 thought you were saying ther ewas some legacy bios thing that was not right
[16:28:24] <cmjohnson1_>	 i thought the SATA AHCI wasn't set...but I did do that..so not sure
[16:28:42] <cmjohnson1_>	 trying to install 1002 now to see what it says
[16:28:48] <ottomata>	 AHCI SATA Controller (v0.87) :)
[16:28:50] <ottomata>	 k
[16:28:52] <cmjohnson1_>	 yep
[16:28:54] <ottomata>	 i'm watcying watching 1003
[16:28:55] <cmjohnson1_>	 that's correct
[16:29:16] <icinga-wm>	 PROBLEM - puppet last run on sca1002 is CRITICAL: CRITICAL: puppet fail
[16:29:31] <ottomata>	 cool 1003 is netbooting
[16:29:59] <thcipriani>	 godog: ah, I see what's happening. In keyholder::agent the keyname has anything \W replaced with _ whereas in scap::target that's not happening.
[16:30:15] <icinga-wm>	 PROBLEM - puppet last run on sca1001 is CRITICAL: CRITICAL: puppet fail
[16:30:21] <thcipriani>	 godog: I can patch as well. Sorry :(
[16:30:25] <icinga-wm>	 PROBLEM - puppet last run on aqs1002 is CRITICAL: CRITICAL: puppet fail
[16:30:50] <godog>	 thcipriani: np, yeah I think that's what's happening, odd it wasn't catched before though
[16:30:55] <icinga-wm>	 PROBLEM - puppet last run on scb2001 is CRITICAL: CRITICAL: puppet fail
[16:31:27] <elukey>	 mmmm
[16:31:28] <elukey>	 Error: Could not retrieve catalog from remote server: Error 400 on SERVER: secret(): invalid secret keyholder/deploy-service.pub at /etc/puppet/modules/scap/manifests/target.pp:83 on node aqs1001.eqiad.wmnet
[16:31:34] <icinga-wm>	 PROBLEM - puppet last run on scb1001 is CRITICAL: CRITICAL: puppet fail
[16:32:07] <godog>	 elukey: yeah, a few lines up in the backscroll
[16:32:29] <elukey>	 godog: ahhh sorry didn't see it, thanks :)
[16:32:39] <wikibugs>	 06Operations, 10ops-codfw: ms-be2012.codfw.wmnet: slot=10 dev=sdk failed - https://phabricator.wikimedia.org/T135975#2330837 (10Papaul) p:05Triage>03Normal
[16:32:46] <wikibugs>	 06Operations, 10ops-codfw, 13Patch-For-Review: codfw old mw app server decomission - https://phabricator.wikimedia.org/T135468#2330840 (10Papaul) p:05Triage>03Normal
[16:33:20] <wikibugs>	 06Operations, 06Parsing-Team, 06Services, 03Mobile-Content-Service: ChangeProp / RESTBase / Parsoid outage 2016-05-05 - https://phabricator.wikimedia.org/T134537#2330841 (10mobrovac)
[16:33:39] <wikibugs>	 06Operations, 10ops-codfw, 10DBA, 10hardware-requests, 13Patch-For-Review: Decommission es2005-es2010 - https://phabricator.wikimedia.org/T134755#2330842 (10Papaul) p:05Triage>03Normal
[16:34:14] <grrrit-wm>	 (03PS1) 10Thcipriani: Fix key name in scap::target [puppet] - 10https://gerrit.wikimedia.org/r/290973 
[16:34:36] <wikibugs>	 06Operations, 06Parsing-Team, 06Services, 03Mobile-Content-Service, 13Patch-For-Review: Create functional cluster checks for all services (and have them page!) - https://phabricator.wikimedia.org/T134551#2330846 (10mobrovac) I think we can consider this resolved now?
[16:34:55] <icinga-wm>	 PROBLEM - puppet last run on sca2001 is CRITICAL: CRITICAL: puppet fail
[16:35:21] <wikibugs>	 06Operations, 06Parsing-Team, 06Services, 03Mobile-Content-Service: ChangeProp / RESTBase / Parsoid outage 2016-05-05 - https://phabricator.wikimedia.org/T134537#2268779 (10mobrovac) From what I can tell all but the Parsoid issues have been dealt with. Should we resolve this?
[16:36:08] <ottomata>	 hmm cmjohnson1_    Installation step failed                      │
[16:36:08] <ottomata>	      │ An installation step failed. You can try to run the failing item  │
[16:36:08] <ottomata>	      │ again from the menu, or skip it and choose something else. The    │
[16:36:08] <ottomata>	      │ failing step is: Select and install software                      │
[16:36:08] <ottomata>	      │
[16:36:11] <ottomata>	 oook....?
[16:36:11] <godog>	 thcipriani: you should change the variable also in the secret() call, looks good otherwise
[16:36:20] <grrrit-wm>	 (03PS2) 10Thcipriani: Fix key name in scap::target [puppet] - 10https://gerrit.wikimedia.org/r/290973 
[16:36:28] <ottomata>	 no indication as to why
[16:36:39] <cmjohnson1_>	 ottomata: that is not h/w related
[16:36:43] <ottomata>	 aye
[16:36:45] <cmjohnson1_>	 something wrong with partman recipe most likely
[16:36:51] <wikibugs>	 06Operations, 06Analytics-Kanban, 10Traffic: Verify why varnishkafka stats and webrequest logs count differs - https://phabricator.wikimedia.org/T136314#2330849 (10ema) When are we seeing those inconsistencies? Any specific timeframes?
[16:36:54] <ottomata>	 ah ok
[16:36:56] <ottomata>	 likely :)
[16:37:15] <mutante>	 ottomata: from the busybox installer shell, you can maybe find more details in /var/log/ , logs of the installer itself
[16:37:24] <mutante>	 i forget the exact path
[16:37:26] <ottomata>	 busybox?
[16:37:37] <ottomata>	 Execute a shell
[16:37:37] <ottomata>	 ?
[16:37:40] <mutante>	 the shell you get when you "execute shell" from within the installer
[16:37:43] <mutante>	 yes
[16:37:43] <ottomata>	 ah yes
[16:37:53] <thcipriani>	 godog: confirmed that patch fixes https://gerrit.wikimedia.org/r/#/c/290973/
[16:38:34] <grrrit-wm>	 (03CR) 10Filippo Giunchedi: [C: 032 V: 032] Fix key name in scap::target [puppet] - 10https://gerrit.wikimedia.org/r/290973 (owner: 10Thcipriani)
[16:38:34] <mutante>	 ottomata: there should be one log for partman and one for the installer or so
[16:38:38] <godog>	 thcipriani: yup thanks for the quick fix!
[16:38:38] <ottomata>	 ja
[16:39:22] <thcipriani>	 godog: thank you for quick the merges, sorry for the rocky deploy.
[16:40:05] <icinga-wm>	 PROBLEM - puppet last run on scb2002 is CRITICAL: CRITICAL: puppet fail
[16:40:16] <godog>	 thcipriani: haha that's okay, no worries
[16:40:33] <godog>	 puppet should be recovering soon
[16:40:45] <icinga-wm>	 PROBLEM - puppet last run on scb1002 is CRITICAL: CRITICAL: puppet fail
[16:40:48] <grrrit-wm>	 (03PS2) 10Filippo Giunchedi: Do not include public keys in keyholder check [puppet] - 10https://gerrit.wikimedia.org/r/290966 (owner: 10Thcipriani)
[16:40:49] <ottomata>	 mutante:  hmm, not sure what to look for in these logs
[16:40:50] <ottomata>	 but
[16:40:58] <ottomata>	 the partitions/md/lvm looks right
[16:41:29] <grrrit-wm>	 (03CR) 10Filippo Giunchedi: [C: 032 V: 032] Do not include public keys in keyholder check [puppet] - 10https://gerrit.wikimedia.org/r/290966 (owner: 10Thcipriani)
[16:41:46] <mutante>	 ottomata: hmm, yea, just to find out which was the last step before it failed
[16:41:55] <icinga-wm>	 RECOVERY - puppet last run on sca2002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[16:42:10] <mutante>	 like the end of the installer log then
[16:42:15] <ottomata>	 hm
[16:42:18] <ottomata>	 hthings like
[16:42:18] <ottomata>	 May 26 16:37:23 in-target:  bind9-host : Depends: libbind9-90 (= 1:9.9.5.dfsg-9+deb8u6) but it is not going to be installed
[16:42:25] <ottomata>	 May 26 16:37:23 in-target: E: Unable to correct problems, you have held broken packages.
[16:42:29] <ottomata>	 May 26 16:37:23 in-target:  rpcbind : Depends: libtirpc1 (>= 0.2.4-2~) but it is not installable
[16:42:50] <ottomata>	 ja cause the step that failed was installing packages
[16:42:57] <ottomata>	 i think partman and base OS was fine
[16:43:17] <icinga-wm>	 RECOVERY - puppet last run on aqs1003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[16:43:19] <mutante>	 yea, that would be after partman, package intsall, yea
[16:43:43] <robh>	 ottomata: i had that happen to me a lot one week
[16:43:46] <robh>	 and then they went away
[16:43:46] <ottomata>	 oh ja?
[16:43:48] <ottomata>	 ?
[16:43:48] <ottomata>	 haha
[16:43:52] <robh>	 and i never figured out why =P
[16:43:52] <mutante>	 bind9-host .. havent we seen this before
[16:43:58] <mutante>	 what rob said
[16:44:01] <ottomata>	 actually i think bind9-host is ok
[16:44:16] <ottomata>	 i think 
[16:44:16] <ottomata>	 May 26 16:37:23 in-target:  rpcbind : Depends: libtirpc1 (>= 0.2.4-2~) but it is not installable
[16:44:16] <mutante>	 yea, but that dependency problem there
[16:44:18] <ottomata>	 is the main prob
[16:44:33] <ottomata>	 those others could be related too
[16:44:34] <ottomata>	 hmm ja
[16:44:35] <robh>	 the dependency issue during installs seemed to be a transient one that wasnt actual package issues
[16:44:41] <robh>	 but that was before and who knows
[16:44:47] <ottomata>	 aye, i tcan't reach out to network maybe or something?
[16:44:55] <ottomata>	 apt
[16:44:55] <ottomata>	 ?
[16:44:58] <icinga-wm>	 RECOVERY - Keyholder SSH agent on mira is OK: OK: Keyholder is armed with all configured keys.
[16:45:12] <wikibugs>	 06Operations, 06Analytics-Kanban, 10Traffic: Verify why varnishkafka stats and webrequest logs count differs - https://phabricator.wikimedia.org/T136314#2330888 (10elukey)
[16:45:18] <cmjohnson1_>	 ottomata on 1003 i see this http://p.defau.lt/?EOZ_yG38xNN2RkitoRs_nQ
[16:45:22] <godog>	 thcipriani: looking good, mira rearmed as well
[16:45:39] <ottomata>	 cmjohnson1_:  on 1003?
[16:45:44] <ottomata>	 i'm in installer still on 1003
[16:45:45] <wikibugs>	 06Operations, 06Analytics-Kanban, 10Traffic: Verify why varnishkafka stats and webrequest logs count differs - https://phabricator.wikimedia.org/T136314#2330335 (10elukey) >>! In T136314#2330849, @ema wrote: > When are we seeing those inconsistencies? Any specific timeframes?   Need to query Hive and Hadoop...
[16:45:45] <ottomata>	 in a shell
[16:45:51] <thcipriani>	 godog: \o/ awesome! Thanks for your help!
[16:46:00] <cmjohnson1_>	 sorry 1002
[16:46:27] <godog>	 thcipriani: np! thanks twentyafterfour too
[16:46:31] <ottomata>	 H
[16:46:32] <ottomata>	 ah
[16:46:37] <ottomata>	 yeah i think i got that on 1001 at some point
[16:46:55] <ottomata>	 dunno what makes it boot into that, but i'm guessing it is that the installer didn't finish properly
[16:46:57] <ottomata>	 but it did install the os
[16:47:13] <mutante>	 ottomata: did it succesfully install any other package before thta.. or can it just not install any package .. is what im wondering now
[16:47:26] <mutante>	 if the latter maybe it's just network/vlan/proxy
[16:47:49] <mutante>	 to get to the apt repo
[16:48:24] <godog>	 Dereckson: saw your patch but I have to go shortly and can't make it today :( sorry about that
[16:48:33] <Bsadowski1>	 Hmm
[16:49:56] <ottomata>	 May 26 16:37:21 in-target: E: Package 'laptop-detect' has no installation candidate
[16:50:09] <ottomata>	 May 26 16:37:21 debconf: --> GET mirror/http/proxy
[16:50:09] <ottomata>	 May 26 16:37:21 debconf: <-- 0 http://webproxy.eqiad.wmnet:8080
[16:50:27] <ottomata>	 May 26 16:37:22 in-target: E: Unable to locate package installation-report
[16:50:38] <ottomata>	 May 26 16:37:22 in-target: E: Unable to locate package popularity-contest
[16:51:02] <ottomata>	 mutante:  i *think* I don't see any successful post core os package installations
[16:51:20] <wikibugs>	 06Operations, 10hardware-requests: new labstore hardware for eqiad - https://phabricator.wikimedia.org/T126089#2330946 (10Cmjohnson)
[16:51:21] <mutante>	 hmmm.. maybe it cant talk to webproxy.eqiad from the new VLAN
[16:51:22] <wikibugs>	 06Operations, 10ops-eqiad, 06DC-Ops: testing: r430 server / h800 controller / md1200 shelf - https://phabricator.wikimedia.org/T127490#2330943 (10Cmjohnson) 05Open>03Resolved Removed from puppet, salt and wiped disks. The error was mine, I for some reason didn't think it was ever installed.
[16:51:50] <mutante>	 maybe logs on webproxy.eqiad.wmnet then
[16:52:22] <mutante>	 would it say "Unable to locate package" if it really meant " i could not even ask"?
[16:52:35] <mutante>	 still unable to.. but a different kind
[16:52:56] <ottomata>	 ja dunno
[16:53:17] <icinga-wm>	 RECOVERY - puppet last run on aqs1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[16:54:17] <mutante>	 ottomata: from where to where did it move.. network wise
[16:55:19] <Dereckson>	 godog: ack, no problem
[16:55:28] <icinga-wm>	 RECOVERY - puppet last run on scb1001 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures
[16:55:59] <ottomata>	 mutante:  it has never been installed before
[16:56:03] <ottomata>	 but, it is in the analytics vlan
[16:56:08] <icinga-wm>	 RECOVERY - puppet last run on sca1001 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures
[16:56:37] <icinga-wm>	 RECOVERY - puppet last run on sca1002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[16:56:47] <icinga-wm>	 RECOVERY - puppet last run on aqs1002 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures
[16:57:12] <mutante>	 ottomata: ah, ok. and other servers in analytics vlan can use that webproxy just fine i assume.  how  about  tail -f /var/log/squid3/access.log  on carbon (webproxy)  while you try it again
[16:58:54] <ottomata>	 hm, mutante retrying just Select and Install software doesn't show anything there
[16:58:57] <robh>	 well, when was the last insall in analytics vlan and was it for jessie?
[16:58:57] <ottomata>	 it fails pretty quickly though
[16:59:14] <robh>	 i dont recall what sysetm i had the error on
[16:59:16] <mutante>	 hmmm
[16:59:17] <ottomata>	 robh, i think the recent aqs100[456]
[16:59:18] <icinga-wm>	 RECOVERY - puppet last run on scb2001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[16:59:19] <ottomata>	 and those are jessie
[16:59:22] <ottomata>	 oh, but those
[16:59:25] <ottomata>	 are not in analytics vlan
[16:59:26] <ottomata>	 hm
[16:59:31] <ottomata>	 hm
[16:59:49] <robh>	 yeah, i am wondering if this is a vlan issue for apt/security/routing or something else, no idea
[17:00:02] <robh>	 but id say for kicks try to install trusty and see if it has the error?
[17:00:04] <jouncebot>	 yurik gwicke cscott arlolra subbu: Dear anthropoid, the time has come. Please deploy Services – Graphoid / Parsoid / OCG / Citoid (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160526T1700).
[17:00:09] <robh>	 just to narrow down scope
[17:00:13] <mutante>	 it feels like double checking the vlan config would be good, yea
[17:00:19] <mutante>	 and what rob said ..
[17:00:38] <robh>	 if it works for trusty and not jessie, then we know its likely NOT network routing policies
[17:01:00] <robh>	 and then just a jessie config/package/soemthing related issue
[17:01:05] <robh>	 heh, 'just'
[17:01:18] <icinga-wm>	 RECOVERY - puppet last run on sca2001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[17:04:39] <ottomata>	 hm
[17:04:41] <ottomata>	 robh, but
[17:04:44] <ottomata>	 jessie does install
[17:04:53] <ottomata>	 the core os does
[17:04:57] <icinga-wm>	 PROBLEM - restbase endpoints health on restbase1014 is CRITICAL: Generic error: Generic connection error: HTTPConnectionPool(host=10.64.48.133, port=7231): Max retries exceeded with url: /en.wikipedia.org/v1/?spec (Caused by ProtocolError(Connection aborted., error(111, Connection refused)))
[17:04:59] <robh>	 yes, but they arent the same packages
[17:05:01] <ottomata>	 hm
[17:05:03] <ottomata>	 ja
[17:05:04] <robh>	 trusty/jessie
[17:05:08] <icinga-wm>	 PROBLEM - Restbase root url on restbase1014 is CRITICAL: Connection refused
[17:05:11] <ottomata>	 i'd like to veriyf that webproxy works from the shell
[17:05:13] <ottomata>	 not sure how to do that
[17:05:17] <mutante>	 so we know that tftp works, but when it gets to install packages via the http proxy.. maybe not
[17:05:23] <robh>	 so im just trying to determine if its a network security policy thing for that vlan, or a jessie package issue
[17:05:34] <ottomata>	 trying to find a way...
[17:05:37] <robh>	 if ubuntu apt works, then we know its not network
[17:05:37] <icinga-wm>	 RECOVERY - puppet last run on scb2002 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures
[17:05:52] <robh>	 since it has no info, seemed easy enough to just locally hack out the jessie line for dhcp
[17:05:54] <robh>	 and reinstall
[17:06:14] <robh>	 (which would rule out the network policy for apt issue since i cannot view or cannot make sense of them off the router ;)
[17:06:36] <robh>	 my idea may not be valid, hence i share why i suggest =]
[17:06:38] <mutante>	 yea, try the trusty install, i have a feeling it might just work 
[17:06:44] <ottomata>	 k will try it, one sec
[17:07:01] <robh>	 that would narrow it down to a jessie issue if it does, well, jessie with our apt/packages
[17:07:06] <mobrovac>	 what's with rb1014?
[17:07:10] <mobrovac>	 urandom: known ^ ?
[17:07:18] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s4 on dbstore1002 is CRITICAL: CRITICAL slave_sql_state could not connect
[17:07:18] <icinga-wm>	 PROBLEM - MariaDB Slave IO: s2 on dbstore1002 is CRITICAL: CRITICAL slave_io_state could not connect
[17:07:18] <icinga-wm>	 PROBLEM - MariaDB Slave IO: m2 on dbstore1002 is CRITICAL: CRITICAL slave_io_state could not connect
[17:07:18] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: m2 on dbstore1002 is CRITICAL: CRITICAL slave_sql_state could not connect
[17:07:19] <icinga-wm>	 PROBLEM - MariaDB Slave IO: s5 on dbstore1002 is CRITICAL: CRITICAL slave_io_state could not connect
[17:07:19] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s5 on dbstore1002 is CRITICAL: CRITICAL slave_sql_state could not connect
[17:07:29] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: x1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_state could not connect
[17:07:29] <icinga-wm>	 PROBLEM - MariaDB Slave IO: x1 on dbstore1002 is CRITICAL: CRITICAL slave_io_state could not connect
[17:07:38] <icinga-wm>	 PROBLEM - MariaDB Slave IO: s6 on dbstore1002 is CRITICAL: CRITICAL slave_io_state could not connect
[17:07:38] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s6 on dbstore1002 is CRITICAL: CRITICAL slave_sql_state could not connect
[17:07:58] <icinga-wm>	 RECOVERY - puppet last run on scb1002 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures
[17:08:08] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: m3 on dbstore1002 is CRITICAL: CRITICAL slave_sql_state could not connect
[17:08:08] <icinga-wm>	 PROBLEM - MariaDB Slave IO: s1 on dbstore1002 is CRITICAL: CRITICAL slave_io_state could not connect
[17:08:23] <robh>	 uh
[17:08:26] <jynus>	 what is this
[17:08:28] <robh>	 Was that expected?
[17:08:32] <ottomata>	 robh:  local hack on carbon?
[17:08:32] <jynus>	 nope
[17:08:38] <icinga-wm>	 PROBLEM - MariaDB Slave IO: m3 on dbstore1002 is CRITICAL: CRITICAL slave_io_state could not connect
[17:08:48] <icinga-wm>	 RECOVERY - Keyholder SSH agent on tin is OK: OK: Keyholder is armed with all configured keys.
[17:08:48] <icinga-wm>	 PROBLEM - MariaDB Slave IO: s3 on dbstore1002 is CRITICAL: CRITICAL slave_io_state could not connect
[17:08:48] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s2 on dbstore1002 is CRITICAL: CRITICAL slave_sql_state could not connect
[17:08:57] <robh>	 ottomata: halt puppet, open the /etc/dhcp/linux.hosts.blah and remove the two lines for jessie for the system you are installing
[17:08:58] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s7 on dbstore1002 is CRITICAL: CRITICAL slave_sql_state could not connect
[17:08:58] <icinga-wm>	 PROBLEM - MariaDB Slave IO: s7 on dbstore1002 is CRITICAL: CRITICAL slave_io_state could not connect
[17:09:05] <robh>	 then reboot it into ubuntu, once installer loads, reenable puppet
[17:09:08] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s3 on dbstore1002 is CRITICAL: CRITICAL slave_sql_state could not connect
[17:09:08] <icinga-wm>	 PROBLEM - MariaDB Slave IO: s4 on dbstore1002 is CRITICAL: CRITICAL slave_io_state could not connect
[17:09:18] <robh>	 (easlier than putting in a new patch just for a single reboot)
[17:09:51] <ottomata>	 k
[17:10:13] <jynus>	 I think mysql crashed
[17:10:23] <jynus>	 and that is really bad news
[17:10:25] <grrrit-wm>	 (03PS1) 10Faidon Liambotis: Create raid module to hold RAID monitoring checks [puppet] - 10https://gerrit.wikimedia.org/r/290986 (https://phabricator.wikimedia.org/T84050) 
[17:10:26] <ottomata>	 cmjohnson1_:  i still don't understand the boot order bootsource5 thing
[17:10:27] <grrrit-wm>	 (03PS1) 10Faidon Liambotis: raid: add HP's RAID tool to the list [puppet] - 10https://gerrit.wikimedia.org/r/290987 (https://phabricator.wikimedia.org/T97998) 
[17:10:29] <grrrit-wm>	 (03PS1) 10Faidon Liambotis: raid: add a new "raid" fact [puppet] - 10https://gerrit.wikimedia.org/r/290988 (https://phabricator.wikimedia.org/T84050) 
[17:10:34] <ottomata>	 status=2
[17:10:34] <ottomata>	 status_tag=COMMAND PROCESSING FAILED
[17:10:34] <ottomata>	 error_tag=INVALID TARGET
[17:10:37] <ottomata>	 ther eis no bootsource5 on these
[17:10:40] <jynus>	 for a tokudb host
[17:10:47] <urandom>	 mobrovac: nothing i am doing, no
[17:10:49] <paravoid>	 godog, volans: ^^^ please review
[17:10:51] <paravoid>	 (more will follow)
[17:11:14] <jynus>	 probably OOM
[17:11:53] <urandom>	 mobrovac: there is an instance bootstrapping there, but it hasn't started listening for connections yet
[17:12:01] <cmjohnson1_>	 ottomata: I don't know either
[17:12:27] <grrrit-wm>	 (03CR) 10Dzahn: [C: 031] Apache: redirect pk.wikimedia.org to wikimediapakistan.org [puppet] - 10https://gerrit.wikimedia.org/r/286147 (https://phabricator.wikimedia.org/T56780) (owner: 10Dereckson)
[17:12:30] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Create raid module to hold RAID monitoring checks [puppet] - 10https://gerrit.wikimedia.org/r/290986 (https://phabricator.wikimedia.org/T84050) (owner: 10Faidon Liambotis)
[17:12:37] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] raid: add a new "raid" fact [puppet] - 10https://gerrit.wikimedia.org/r/290988 (https://phabricator.wikimedia.org/T84050) (owner: 10Faidon Liambotis)
[17:12:42] <paravoid>	 blergh
[17:13:33] <grrrit-wm>	 (03PS2) 10Faidon Liambotis: Create raid module to hold RAID monitoring checks [puppet] - 10https://gerrit.wikimedia.org/r/290986 (https://phabricator.wikimedia.org/T84050) 
[17:13:35] <grrrit-wm>	 (03PS2) 10Faidon Liambotis: raid: add HP's RAID tool to the list [puppet] - 10https://gerrit.wikimedia.org/r/290987 (https://phabricator.wikimedia.org/T97998) 
[17:13:37] <grrrit-wm>	 (03PS2) 10Faidon Liambotis: raid: add a new "raid" fact [puppet] - 10https://gerrit.wikimedia.org/r/290988 (https://phabricator.wikimedia.org/T84050) 
[17:16:01] <ottomata1>	 aghhh
[17:16:06] <ottomata1>	 cmjohnson1_:  my internet just died
[17:16:08] <ottomata1>	 lost my connection to the vsp
[17:16:10] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] raid: add a new "raid" fact [puppet] - 10https://gerrit.wikimedia.org/r/290988 (https://phabricator.wikimedia.org/T84050) (owner: 10Faidon Liambotis)
[17:16:15] <ottomata1>	 Virtual Serial Port is currently in use by another session.
[17:16:18] <ottomata1>	 how do I clear it?
[17:16:31] <robh>	 on an hp?
[17:16:32] <ottomata1>	 oh you have nice docs...
[17:16:36] <mobrovac>	 somebody seemed to have stopped rb there urandom
[17:16:36] <ottomata1>	 stop /system1/oemhp_vsp1
[17:16:36] <jynus>	 atp Hash Sum mismatch, nice
[17:16:37] <ottomata1>	 :)
[17:16:37] <mobrovac>	 wth?
[17:17:02] <grrrit-wm>	 (03CR) 10Dzahn: "disregard that, i didn't look at the netmask right." [puppet] - 10https://gerrit.wikimedia.org/r/290348 (owner: 10Alex Monk)
[17:17:52] <mutante>	 ottomata1: racadm racreset
[17:17:57] <ottomata1>	 mutante:  this is a hp
[17:18:05] <ottomata1>	 i found it though
[17:18:06] <ottomata1>	 stop /system1/oemhp_vsp1
[17:18:06] <mutante>	 ottomata1: just realized, nvm 
[17:18:09] <ottomata1>	 :)
[17:18:13] <mutante>	 good to know 
[17:18:58] <icinga-wm>	 RECOVERY - restbase endpoints health on restbase1014 is OK: All endpoints are healthy
[17:19:08] <icinga-wm>	 RECOVERY - Restbase root url on restbase1014 is OK: HTTP OK: HTTP/1.1 200 - 15273 bytes in 0.027 second response time
[17:19:18] <grrrit-wm>	 (03PS2) 10Dzahn: scap: add labtestwikitech to mediawiki-installation group [puppet] - 10https://gerrit.wikimedia.org/r/290348 (owner: 10Alex Monk)
[17:20:14] <mutante>	 jouncebot: next
[17:20:14] <jouncebot>	 In 1 hour(s) and 39 minute(s): MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160526T1900)
[17:21:13] <urandom>	 mobrovac: sorry, yeah, just came to the same conclusion
[17:21:18] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s5 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag could not connect
[17:21:34] <urandom>	 mobrovac: that it looked like it was just shutdown
[17:21:38] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s6 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag could not connect
[17:21:58] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: m2 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag could not connect
[17:21:59] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s7 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag could not connect
[17:22:07] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: m3 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag could not connect
[17:22:28] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: x1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag could not connect
[17:22:28] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s2 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag could not connect
[17:22:41] <jynus>	 yes, yes, we knew with the first time you told us
[17:22:48] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s3 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag could not connect
[17:22:58] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s4 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag could not connect
[17:24:11] <jynus>	 we'll see if you come back
[17:24:30] <grrrit-wm>	 (03PS3) 10Faidon Liambotis: raid: add a new "raid" fact [puppet] - 10https://gerrit.wikimedia.org/r/290988 (https://phabricator.wikimedia.org/T84050) 
[17:25:15] <grrrit-wm>	 (03PS4) 10Faidon Liambotis: raid: add a new "raid" fact [puppet] - 10https://gerrit.wikimedia.org/r/290988 (https://phabricator.wikimedia.org/T84050) 
[17:26:08] <wikibugs>	 06Operations, 06Analytics-Kanban, 10Traffic: Verify why varnishkafka stats and webrequest logs count differs - https://phabricator.wikimedia.org/T136314#2330335 (10Nuria) 1. On labs or perhaps prod: Generate lots of request and a sighup and see if all requests ids are present, try to find repro for dropping...
[17:27:03] <grrrit-wm>	 (03PS5) 10Faidon Liambotis: raid: add a new "raid" fact [puppet] - 10https://gerrit.wikimedia.org/r/290988 (https://phabricator.wikimedia.org/T84050) 
[17:28:32] <wikibugs>	 06Operations, 06Analytics-Kanban, 10Traffic: Verify why varnishkafka stats and webrequest logs count differs - https://phabricator.wikimedia.org/T136314#2331150 (10Ottomata) https://gist.github.com/ottomata/7048012
[17:30:18] <volans>	 jynus: needs help?
[17:30:38] <jynus>	 nah, now that it crashed, I am upgrading and restarting it
[17:32:11] <jynus>	 problem is in what state is after restart
[17:32:16] <ottomata1>	 mutante: robh, fyi, ubuntu worked
[17:32:30] <robh>	 so its an issue with jessie specifically, and not our network
[17:32:35] <robh>	 progress
[17:32:57] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] raid: add HP's RAID tool to the list [puppet] - 10https://gerrit.wikimedia.org/r/290987 (https://phabricator.wikimedia.org/T97998) (owner: 10Faidon Liambotis)
[17:33:04] <jynus>	 volans, on the good side, we know what caused es2019/es2017 crashes
[17:33:33] <paravoid>	 lol wtf jenkins
[17:33:35] <paravoid>	 18 minutes?
[17:33:37] <cmjohnson1_>	 robh yay progress
[17:33:46] <paravoid>	 for something that doesn't look like an error anyway
[17:33:49] <mutante>	 ottomata1: oh! right, so installer issue with HP .. _again_ :/
[17:33:55] <volans>	 jynus: yay, saw the emails
[17:33:58] <grrrit-wm>	 (03CR) 10Faidon Liambotis: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/290987 (https://phabricator.wikimedia.org/T97998) (owner: 10Faidon Liambotis)
[17:36:24] <ottomata1>	 mutante:  its HP jessie issue?
[17:37:15] <mdholloway>	 !log starting mobileapps deployment
[17:37:22] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[17:37:25] <grrrit-wm>	 (03CR) 10Dereckson: "Ping?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/288582 (https://phabricator.wikimedia.org/T135212) (owner: 10Lokal Profil)
[17:37:33] <ottomata>	 hey akosiaris, did you disable puppet on carbon?
[17:37:36] <ottomata>	 it was disabled a bit ago
[17:37:38] <ottomata>	 and i disabled it too
[17:37:43] <ottomata>	 but not sure if someone reenabled in between
[17:37:47] <ottomata>	 so not sure if i should reenable it
[17:37:58] <mutante>	 ottomata: i just say that because we know it doesnt happen with trusty and it's HP hardware and we had an installer bug before
[17:38:30] <ottomata>	 sigh, ok
[17:38:42] <ottomata>	 what should I do?
[17:39:13] <mutante>	 ottomata: create a ticket and paste the error from earlier with the package dependency issue
[17:39:42] <wikibugs>	 06Operations, 10cassandra: Staging / Test environment(s) for RESTBase - https://phabricator.wikimedia.org/T136340#2331203 (10Eevans)
[17:39:42] <mutante>	 let me look for an older one
[17:39:43] <ottomata>	 mutante:  tags?
[17:39:57] <mutante>	 ottomata: just "operations" i guess
[17:40:10] <ottomata>	 k
[17:40:56] <grrrit-wm>	 (03CR) 10Dzahn: [C: 032] scap: add labtestwikitech to mediawiki-installation group [puppet] - 10https://gerrit.wikimedia.org/r/290348 (owner: 10Alex Monk)
[17:41:05] <wikibugs>	 06Operations, 10cassandra, 10procurement: Staging / Test environment(s) for RESTBase - https://phabricator.wikimedia.org/T136340#2331215 (10faidon) p:05Triage>03Normal
[17:41:51] <grrrit-wm>	 (03PS6) 10Faidon Liambotis: raid: add a new "raid" fact [puppet] - 10https://gerrit.wikimedia.org/r/290988 (https://phabricator.wikimedia.org/T84050) 
[17:42:55] <robh>	 ottomata: yeah and i'll add in my findings as well if i find the old host i had issue with last week
[17:43:24] <wikibugs>	 06Operations: Jessie install on HP Fails - https://phabricator.wikimedia.org/T136341#2331222 (10Ottomata)
[17:43:35] <wikibugs>	 06Operations: Jessie install on HP Fails - https://phabricator.wikimedia.org/T136341#2331237 (10Ottomata)
[17:43:37] <wikibugs>	 06Operations, 10ops-eqiad, 13Patch-For-Review: rack/setup/deploy 3 eqiad druid nodes - https://phabricator.wikimedia.org/T134275#2260510 (10Ottomata)
[17:43:43] <ottomata>	 ok robh
[17:43:52] <ottomata>	 https://phabricator.wikimedia.org/T136341
[17:43:54] <ottomata>	 there it is
[17:44:17] <wikibugs>	 06Operations, 10ops-eqiad, 13Patch-For-Review: rack/setup/deploy 3 eqiad druid nodes - https://phabricator.wikimedia.org/T134275#2260510 (10Ottomata) Currently stuck on T136341. :(
[17:44:17] <mutante>	 i tried but couldnt find a HP specific one yet
[17:45:41] <paravoid>	 why would that be HP specific?
[17:46:14] <paravoid>	 why would a dpkg error about bind9-host be HP specific, seriously
[17:47:35] <paravoid>	 ottomata: try reinstalling that system
[17:47:51] <mutante>	 just had vague memories of another install issue we had in the past with the HP servers
[17:49:42] <ottomata>	 paravoid:  ok
[17:49:50] <ottomata>	 wait,  uhhh
[17:49:52] <ottomata>	 druid1001 intalled!
[17:49:56] <ottomata>	 just looked back at a screen
[17:49:59] <ottomata>	 ooooooook
[17:50:10] <paravoid>	 I updated d-i, not sure if there was any change
[17:50:19] <paravoid>	 try reinstalling the one that failed
[17:50:40] <wikibugs>	 06Operations: Set jessie as the default os installer on network boot and manually mark other versions (precise, trusty) - https://phabricator.wikimedia.org/T133539#2234934 (10Dzahn) Recently looked.. there are many jessie but this switch really starts to make sense once the appservers are switching now because t...
[17:51:17] <robh>	 ottomata: try it more than once to make sure! ;]
[17:51:48] <grrrit-wm>	 (03PS1) 10Faidon Liambotis: raid: vary package installation on the RAID installed [puppet] - 10https://gerrit.wikimedia.org/r/290999 (https://phabricator.wikimedia.org/T84050) 
[17:52:32] <ottomata>	 uhhh druid1003 also installed jessie, i think while i wasn't looking...?
[17:52:40] <robh>	 yeah i mean we should reinstall and watch it
[17:52:49] <ottomata>	 ok, well, 1002 needs to go
[17:52:51] <ottomata>	 so doing that now
[17:52:53] <robh>	 cool
[17:52:57] <robh>	 paravoid: thank you!
[17:53:30] <robh>	 it was a transient issue before when i had the package conflict messages similar to this
[17:53:38] <robh>	 in that i had it one evening, and by the next day i did not.
[17:54:11] <robh>	 and trying to find something in every phab task I touch for setups has proven fruitless =P
[17:54:18] <robh>	 (so no clue what system it was)
[17:56:04] <milimetric>	 jynus: I had just pulled a whole table out of analytics-store with sqoop a half hour or so before it crashed
[17:56:25] <mdholloway>	 !log mobileapps finished deploying 5ce4f31 (n.b. last deployment, on 23 May, appears to have re-deployed b8c396a)
[17:56:29] <milimetric>	 just letting you know because when you bring it back up I was planning on pulling some more (larger) tables
[17:56:32] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[17:58:23] <jynus>	 well, it crashed because out of memory error
[17:59:24] <milimetric>	 k, then if it crashes again with oom after I sqoop out of it, we'll know it was me :/
[17:59:25] <jynus>	 we will see why soon
[17:59:38] <jynus>	 milimetric, which tables?
[17:59:58] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: m2 on dbstore1002 is OK: OK slave_sql_state not a slave
[17:59:58] <icinga-wm>	 RECOVERY - MariaDB Slave IO: m2 on dbstore1002 is OK: OK slave_io_state not a slave
[18:00:03] <milimetric>	 I grabbed simplewiki.logging around 16:00 UTC
[18:00:09] <jynus>	 nah, not you
[18:00:16] <milimetric>	 k
[18:00:30] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: m2 on dbstore1002 is OK: OK slave_sql_lag not a slave
[18:01:01] <wikibugs>	 06Operations, 10Ops-Access-Requests: Requesting access to stat1002 for Pcoombe - https://phabricator.wikimedia.org/T136343#2331296 (10Pcoombe)
[18:01:38] <jynus>	 milimetric, but please wait until I update on the email
[18:01:50] <jynus>	 if you start querying it will make the recovery slower
[18:04:11] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: s7 on dbstore1002 is OK: OK slave_sql_state Slave_SQL_Running: Yes
[18:04:12] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: s5 on dbstore1002 is OK: OK slave_sql_state Slave_SQL_Running: Yes
[18:04:12] <icinga-wm>	 RECOVERY - MariaDB Slave IO: s2 on dbstore1002 is OK: OK slave_io_state Slave_IO_Running: Yes
[18:04:12] <icinga-wm>	 RECOVERY - MariaDB Slave IO: s7 on dbstore1002 is OK: OK slave_io_state Slave_IO_Running: Yes
[18:04:31] <icinga-wm>	 RECOVERY - MariaDB Slave IO: s5 on dbstore1002 is OK: OK slave_io_state Slave_IO_Running: Yes
[18:04:31] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: s6 on dbstore1002 is OK: OK slave_sql_state Slave_SQL_Running: Yes
[18:04:42] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: s3 on dbstore1002 is OK: OK slave_sql_state Slave_SQL_Running: Yes
[18:04:42] <icinga-wm>	 RECOVERY - MariaDB Slave IO: s4 on dbstore1002 is OK: OK slave_io_state Slave_IO_Running: Yes
[18:04:52] <icinga-wm>	 RECOVERY - MariaDB Slave IO: s6 on dbstore1002 is OK: OK slave_io_state Slave_IO_Running: Yes
[18:04:53] <jynus>	 looks better than I thought
[18:05:11] <icinga-wm>	 RECOVERY - MariaDB Slave IO: x1 on dbstore1002 is OK: OK slave_io_state Slave_IO_Running: Yes
[18:05:20] <jynus>	 I think only x1 is broken, and it is only 200GB
[18:05:23] <icinga-wm>	 RECOVERY - MariaDB Slave IO: m3 on dbstore1002 is OK: OK slave_io_state Slave_IO_Running: Yes
[18:05:32] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: m3 on dbstore1002 is OK: OK slave_sql_state Slave_SQL_Running: Yes
[18:05:42] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: s4 on dbstore1002 is OK: OK slave_sql_state Slave_SQL_Running: Yes
[18:05:42] <icinga-wm>	 RECOVERY - MariaDB Slave IO: s1 on dbstore1002 is OK: OK slave_io_state Slave_IO_Running: Yes
[18:05:52] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: s2 on dbstore1002 is OK: OK slave_sql_state Slave_SQL_Running: Yes
[18:05:52] <icinga-wm>	 RECOVERY - MariaDB Slave IO: s3 on dbstore1002 is OK: OK slave_io_state Slave_IO_Running: Yes
[18:06:12] <icinga-wm>	 PROBLEM - puppet last run on mw2115 is CRITICAL: CRITICAL: puppet fail
[18:07:12] <wikibugs>	 06Operations, 06Parsing-Team, 06Services, 03Mobile-Content-Service: ChangeProp / RESTBase / Parsoid outage 2016-05-05 - https://phabricator.wikimedia.org/T134537#2331306 (10GWicke) Most issues have indeed been addressed, and most of the remaining ones are also well underway. I agree that this task is no lo...
[18:09:23] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: m3 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 0.13 seconds
[18:09:46] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] raid: vary package installation on the RAID installed [puppet] - 10https://gerrit.wikimedia.org/r/290999 (https://phabricator.wikimedia.org/T84050) (owner: 10Faidon Liambotis)
[18:14:22] <ottomata>	 hmm, cmjohnson1 robh, druid1002 seems to be different
[18:14:31] <ottomata>	 the install looks like it completed properly
[18:14:32] <ottomata>	 then it rebooted
[18:14:36] <ottomata>	 but couldn't
[18:14:38] <ottomata>	 [    0.113889] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR 38d is 330)
[18:14:48] <ottomata>	 mdadm: No devices listed in conf file were found.
[18:14:48] <ottomata>	 Gave up waiting for root device.  Common problems:
[18:14:59] <robh>	 oh, disks didnt detect in time
[18:15:02] <robh>	 there is a related task for that
[18:15:04] <robh>	 lemme find
[18:15:15] <robh>	 we've seen that on a number of jessie isntalls
[18:15:22] <robh>	 ottomata: you'll want to reference your issue on it as well
[18:15:35] <robh>	 https://phabricator.wikimedia.org/T131961
[18:15:49] <robh>	 try just a soft reset to see if it boots
[18:16:01] <robh>	 though the corrupted hw-pmu is new
[18:16:08] <robh>	 the root device error sounds related.
[18:17:47] <ottomata>	 soft reset?
[18:18:09] <jynus>	 milimetric, things look more or less good, but I would suggest wait for a day for doing heavy queries as now they may be slower than usual
[18:18:42] <milimetric>	 yep, I saw your email.  Hm... some of this stuff is time sensitive, I will try one table smaller than the one I grabbed before and see if things go well
[18:18:50] <milimetric>	 (I don't need fresh results)
[18:18:54] <ottomata>	 ok did power reset
[18:18:56] <ottomata>	 trying to boot
[18:19:42] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s4 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 103.05 seconds
[18:20:46] <ottomata>	 yup, robh, it booted
[18:20:49] <ottomata>	 this tiem
[18:20:53] <ottomata>	 still printed [    0.113817] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR 38d is 330)
[18:21:03] <robh>	 so i dunno what that is at all
[18:21:17] <robh>	 you may wanna make a task so we try to figure it out, but if its not blocking you, it can be lower priority
[18:21:38] <ottomata>	 its  not blocking me
[18:21:41] <ottomata>	 hm
[18:22:07] <robh>	 http://h20564.www2.hpe.com/hpsc/doc/public/display?docId=mmr_kc-0126190
[18:22:13] <robh>	 so yeah, its the power settings
[18:22:31] <robh>	 solution is there
[18:22:32] <ottomata>	 ok will make ticket
[18:22:41] <robh>	 but its not a big deal, we should figure out the setting so make a task
[18:22:49] <robh>	 and i'll chase it down and update docs later, but it doesnt stop anything
[18:22:52] <ottomata>	 k
[18:23:01] <robh>	 its just the kernel thinks it should be able to control power settings and its not being allowed to
[18:23:32] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s2 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 283.39 seconds
[18:23:48] <ottomata>	 robh https://phabricator.wikimedia.org/T136345
[18:23:49] <wikibugs>	 06Operations: HP Warning on boot [Firmware Bug]: the BIOS has corrupted hw-PMU resources - https://phabricator.wikimedia.org/T136345#2331341 (10Ottomata)
[18:24:13] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s5 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 0.35 seconds
[18:26:11] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: x1 on dbstore1002 is OK: OK slave_sql_state Slave_SQL_Running: Yes
[18:26:21] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: x1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 0.29 seconds
[18:28:16] <wikibugs>	 06Operations, 10Traffic, 13Patch-For-Review, 07Performance: Lots of Title::purgeExpiredRestriction from API DELETE FROM `page_restrictions` WHERE (pr_expiry < '20160517063108') without batching/throttling potentially causing lag on s5-api - https://phabricator.wikimedia.org/T135470#2331377 (10aaron) 05Ope...
[18:28:18] <wikibugs>	 06Operations, 10DBA: High replication lag to dewiki - https://phabricator.wikimedia.org/T135100#2331379 (10aaron)
[18:28:31] <grrrit-wm>	 (03PS1) 10Ottomata: Fix lvname for druid volume in druid-4ssd-raid10.cfg [puppet] - 10https://gerrit.wikimedia.org/r/291004 
[18:28:59] <grrrit-wm>	 (03CR) 10Ottomata: [C: 032 V: 032] Fix lvname for druid volume in druid-4ssd-raid10.cfg [puppet] - 10https://gerrit.wikimedia.org/r/291004 (owner: 10Ottomata)
[18:31:20] <grrrit-wm>	 (03PS2) 10Dzahn: Apache: redirect pk.wikimedia.org to wikimediapakistan.org [puppet] - 10https://gerrit.wikimedia.org/r/286147 (https://phabricator.wikimedia.org/T56780) (owner: 10Dereckson)
[18:31:41] <wikibugs>	 06Operations: Jessie install on HP Fails - https://phabricator.wikimedia.org/T136341#2331413 (10Ottomata) 05Open>03Invalid Dunno what was up, but this problem went away.
[18:31:43] <wikibugs>	 06Operations, 10ops-eqiad, 13Patch-For-Review: rack/setup/deploy 3 eqiad druid nodes - https://phabricator.wikimedia.org/T134275#2331415 (10Ottomata)
[18:31:59] <wikibugs>	 06Operations, 10ops-eqiad, 13Patch-For-Review: rack/setup/deploy 3 eqiad druid nodes - https://phabricator.wikimedia.org/T134275#2260510 (10Ottomata) a:05Cmjohnson>03Ottomata
[18:32:18] <wikibugs>	 06Operations, 10ops-eqiad, 13Patch-For-Review: rack/setup/deploy 3 eqiad druid nodes - https://phabricator.wikimedia.org/T134275#2260510 (10Ottomata) Servers are installed!
[18:34:42] <grrrit-wm>	 (03CR) 10Dzahn: [C: 032] "checked with apache-fast-test, mw1017 canary, was on swat window already" [puppet] - 10https://gerrit.wikimedia.org/r/286147 (https://phabricator.wikimedia.org/T56780) (owner: 10Dereckson)
[18:35:42] <icinga-wm>	 RECOVERY - puppet last run on mw2115 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[18:46:32] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s3 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 247.60 seconds
[18:47:31] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s6 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 80.00 seconds
[18:50:47] <grrrit-wm>	 (03PS3) 10Faidon Liambotis: Create raid module to hold RAID monitoring checks [puppet] - 10https://gerrit.wikimedia.org/r/290986 (https://phabricator.wikimedia.org/T84050) 
[18:50:49] <grrrit-wm>	 (03PS3) 10Faidon Liambotis: raid: add HP's RAID tool to the list [puppet] - 10https://gerrit.wikimedia.org/r/290987 (https://phabricator.wikimedia.org/T97998) 
[18:50:51] <grrrit-wm>	 (03PS7) 10Faidon Liambotis: raid: add a new "raid" fact [puppet] - 10https://gerrit.wikimedia.org/r/290988 (https://phabricator.wikimedia.org/T84050) 
[18:50:53] <grrrit-wm>	 (03PS2) 10Faidon Liambotis: raid: vary package installation on the RAID installed [puppet] - 10https://gerrit.wikimedia.org/r/290999 (https://phabricator.wikimedia.org/T84050) 
[18:50:55] <grrrit-wm>	 (03PS1) 10Faidon Liambotis: raid: slightly change check-raid's "utility" names [puppet] - 10https://gerrit.wikimedia.org/r/291011 
[18:50:57] <grrrit-wm>	 (03PS1) 10Faidon Liambotis: raid: move check-raid.py into /usr/local/lib/nagios/plugins [puppet] - 10https://gerrit.wikimedia.org/r/291012 
[18:50:59] <grrrit-wm>	 (03PS1) 10Faidon Liambotis: raid: setup multiple checks, one per each RAID found [puppet] - 10https://gerrit.wikimedia.org/r/291013 (https://phabricator.wikimedia.org/T84050) 
[18:51:01] <grrrit-wm>	 (03PS1) 10Faidon Liambotis: raid: add monitoring for HP controllers [puppet] - 10https://gerrit.wikimedia.org/r/291014 (https://phabricator.wikimedia.org/T97998) 
[18:51:38] <paravoid>	 ...and with that, ttyl :)
[18:51:51] <paravoid>	 godog, volans, jynus, akosiaris: ^^^ :)
[18:54:23] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] raid: slightly change check-raid's "utility" names [puppet] - 10https://gerrit.wikimedia.org/r/291011 (owner: 10Faidon Liambotis)
[18:55:57] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] raid: setup multiple checks, one per each RAID found [puppet] - 10https://gerrit.wikimedia.org/r/291013 (https://phabricator.wikimedia.org/T84050) (owner: 10Faidon Liambotis)
[19:00:04] <jouncebot>	 twentyafterfour: Dear anthropoid, the time has come. Please deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160526T1900).
[19:03:29] <volans>	 paravoid: great, thanks, I will take a look
[19:04:06] <grrrit-wm>	 (03PS2) 10Dzahn: udp2log: move icinga checks from ./files/ to module [puppet] - 10https://gerrit.wikimedia.org/r/290871 
[19:04:18] <grrrit-wm>	 (03CR) 10Dzahn: [C: 032] "no-op http://puppet-compiler.wmflabs.org/2932/" [puppet] - 10https://gerrit.wikimedia.org/r/290871 (owner: 10Dzahn)
[19:04:19] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s7 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 178.09 seconds
[19:08:27] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] raid: vary package installation on the RAID installed [puppet] - 10https://gerrit.wikimedia.org/r/290999 (https://phabricator.wikimedia.org/T84050) (owner: 10Faidon Liambotis)
[19:10:33] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] raid: add HP's RAID tool to the list [puppet] - 10https://gerrit.wikimedia.org/r/290987 (https://phabricator.wikimedia.org/T97998) (owner: 10Faidon Liambotis)
[19:11:46] <grrrit-wm>	 (03PS2) 10Dzahn: udp2log: mv rolematcher.py PacketLossLogtailer.py to module [puppet] - 10https://gerrit.wikimedia.org/r/290872 
[19:25:40] <wikibugs>	 06Operations, 10ops-codfw, 10DBA: es2017 and es2019 crashed with no logs - https://phabricator.wikimedia.org/T130702#2331636 (10Papaul) Will be receiving memory replacement tomorrow  Service Request 930250087 <<#7521282-32655863#>> Service Request 930256880 <<#7521282-32654588#>>
[19:27:57] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] udp2log: mv rolematcher.py PacketLossLogtailer.py to module [puppet] - 10https://gerrit.wikimedia.org/r/290872 (owner: 10Dzahn)
[19:28:21] <twentyafterfour>	 dear jouncebot: :P
[19:28:43] <Dereckson>	 !log mwscript initSiteStats.php --wiki fowiki --update (T136353)
[19:28:45] <stashbot>	 T136353: Reset statistics for fo.wikipedia - https://phabricator.wikimedia.org/T136353
[19:28:51] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[19:31:39] <logmsgbot>	 !log aaron@tin Synchronized php-1.28.0-wmf.3/includes/api/ApiStashEdit.php: 9a9ec26d25 (duration: 00m 24s)
[19:31:46] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[19:32:02] <audephone>	 twentyafterfour are you deploying soon?
[19:32:41] <audephone>	 I'm around just a bit in unlikely case of a problem with our code 
[19:33:06] <twentyafterfour>	 audephone: yes going to push wmf.3 now. Thank you!
[19:33:13] <audephone>	 Okay 
[19:34:25] <grrrit-wm>	 (03PS1) 1020after4: all wikis to 1.28.0-wmf.3 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/291020 
[19:35:22] <grrrit-wm>	 (03CR) 1020after4: [C: 032] all wikis to 1.28.0-wmf.3 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/291020 (owner: 1020after4)
[19:36:09] <grrrit-wm>	 (03Merged) 10jenkins-bot: all wikis to 1.28.0-wmf.3 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/291020 (owner: 1020after4)
[19:40:44] <robh>	 twentyafterfour: any issues with the new labtestwikitech in dsh group?
[19:41:02] <twentyafterfour>	 robh: just about to find out
[19:41:13] <robh>	 heh, cool, im standing by to depool it in dsh if it does 
[19:41:18] <logmsgbot>	 !log twentyafterfour@tin rebuilt wikiversions.php and synchronized wikiversions files: all wikis to 1.28.0-wmf.3
[19:41:25] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[19:41:34] <twentyafterfour>	 all good
[19:41:57] <twentyafterfour>	 robh: no problems with that. do I need to explicitly sync to make sure it's got everything up to date?
[19:42:34] <robh>	 im not entirely certain, i was just asked to be around in case it scerwed up!  its testing wikitech stuff is my understanding
[19:42:38] <audephone>	 :)
[19:43:00] <audephone>	 I am hanging out a bit more
[19:43:17] <robh>	 twentyafterfour: mainly if it shows issues during your syncs we depool and let them sort it out later =]
[19:43:33] <robh>	 since its in itself a test box, im not sure we should spend time toubleshooting it.
[19:43:38] <robh>	 troubleshooting even.
[19:43:44] <robh>	 damn I cannot type today =P
[19:44:03] <twentyafterfour>	 I'm seeing a lot of Unknown namespace ID: 108
[19:44:19] <twentyafterfour>	 in search...
[19:44:23] <audephone>	 I don't know what it is
[19:44:38] <audephone>	 Probably ask erikb 
[19:48:32] <twentyafterfour>	 audephone: I brought it up in #wikimedia-discovery
[19:49:04] <audephone>	 Ok
[19:50:10] <robh>	 so no issues iwth the new labtestwikitech though
[19:50:11] <robh>	 ?
[19:50:15] <twentyafterfour>	 so I'm probably gonna have to roll back at least itwiki, full text search is failing there
[19:50:24] <twentyafterfour>	 robh: none with scap syncing
[19:50:27] <robh>	 cool
[19:50:35] <robh>	 that was the concern, so glad it didnt happen =]
[19:50:42] <robh>	 (not cool for your other issues, those stink)
[19:51:02] <twentyafterfour>	 robh :)  .. just problems with the wmf.3 branch, nothing for you to worry about I think
[19:51:19] <robh>	 I don't think worry entered into it
[19:51:26] * robh has been noming lunch this entire time
[19:51:53] <robh>	 well, half a lunch, i ran out of foodstuffs and need to go grocery shopping this evening.
[19:52:18] <robh>	 in fact, since there wasnt an issue, im going to run down the street for something else to eat, back shortly.
[19:52:51] <grrrit-wm>	 (03PS1) 10Hashar: apache: logrotate augeas rule needs apache2 package [puppet] - 10https://gerrit.wikimedia.org/r/291024 
[19:53:42] <twentyafterfour>	 cool
[19:54:40] <twentyafterfour>	 audephone: Notice: Undefined index: entities in /srv/mediawiki/php-1.28.0-wmf.3/extensions/Wikidata/extensions/ArticlePlaceholder/includes/SearchHookHandler.php on line 243
[19:54:49] <icinga-wm>	 PROBLEM - Unmerged changes on repository mediawiki_config on mira is CRITICAL: There is one unmerged change in mediawiki_config (dir /srv/mediawiki-staging/, ref HEAD..readonly/master).
[19:56:25] <audephone>	 Twentyafterfour known and we have a fix 
[19:56:33] <audephone>	 I can deploy it tomorrow
[19:57:25] <twentyafterfour>	 audephone: ok cool thank you
[19:57:44] <audephone>	 Thanks for poking me
[19:58:49] <icinga-wm>	 RECOVERY - Unmerged changes on repository mediawiki_config on mira is OK: No changes to merge.
[20:03:51] <wikibugs>	 06Operations, 06Discovery, 10Maps, 10Tilerator, 03Discovery-Maps-Sprint: water_polygons import is broken - https://phabricator.wikimedia.org/T112831#2331757 (10MaxSem) 05Open>03Resolved a:03MaxSem Nah, no more recurrences.
[20:04:43] <grrrit-wm>	 (03PS10) 10Ottomata: Initial debian packaging [debs/druid] - 10https://gerrit.wikimedia.org/r/287285 (https://phabricator.wikimedia.org/T134503) 
[20:05:16] <grrrit-wm>	 (03PS2) 10Ottomata: Update Kafka analytics broker list for deployment-prep [mediawiki-config] - 10https://gerrit.wikimedia.org/r/287741 
[20:13:46] <ottomata>	 akosiaris:  yt?
[20:13:52] <ottomata>	 got some more reprepro updates qs
[20:16:48] <grrrit-wm>	 (03PS1) 10Yuvipanda: tools; Use only one uwsgi worker for toolschecker [puppet] - 10https://gerrit.wikimedia.org/r/291038 
[20:17:05] <YuviPanda>	 andrewbogott: chasemp ^ should solidify checker against more false positives too
[20:17:21] <ottomata>	 paravoid: yt?
[20:17:24] <YuviPanda>	 and is a pre-req for the webservice job working reliably I Think
[20:17:32] <grrrit-wm>	 (03PS2) 10Yuvipanda: tools: Use only one uwsgi worker for toolschecker [puppet] - 10https://gerrit.wikimedia.org/r/291038 
[20:17:48] <andrewbogott>	 can it still get all the tests done that need doing with one worker?
[20:18:24] <grrrit-wm>	 (03CR) 10Hashar: [V: 031] "That occurs when including contint::localhost_worker and is purely an ordering issue." [puppet] - 10https://gerrit.wikimedia.org/r/291024 (owner: 10Hashar)
[20:18:26] <YuviPanda>	 andrewbogott: yup, it'll just block
[20:18:44] <YuviPanda>	 andrewbogott: it'll just serialize access
[20:18:50] <grrrit-wm>	 (03PS2) 10Hashar: apache: logrotate augeas rule needs apache2 package [puppet] - 10https://gerrit.wikimedia.org/r/291024 (https://phabricator.wikimedia.org/T136301) 
[20:18:51] <andrewbogott>	 ok
[20:19:05] <grrrit-wm>	 (03CR) 10Andrew Bogott: [C: 031] tools: Use only one uwsgi worker for toolschecker [puppet] - 10https://gerrit.wikimedia.org/r/291038 (owner: 10Yuvipanda)
[20:21:41] <grrrit-wm>	 (03CR) 10Paladox: [C: 031] apache: logrotate augeas rule needs apache2 package [puppet] - 10https://gerrit.wikimedia.org/r/291024 (https://phabricator.wikimedia.org/T136301) (owner: 10Hashar)
[20:24:08] <grrrit-wm>	 (03CR) 10Ottomata: [C: 032 V: 032] Initial debian packaging [debs/druid] - 10https://gerrit.wikimedia.org/r/287285 (https://phabricator.wikimedia.org/T134503) (owner: 10Ottomata)
[20:25:40] <grrrit-wm>	 (03PS3) 10Yuvipanda: tools: Use only one uwsgi worker for toolschecker [puppet] - 10https://gerrit.wikimedia.org/r/291038 
[20:30:05] <grrrit-wm>	 (03PS4) 10Yuvipanda: tools: Use only one uwsgi worker for toolschecker [puppet] - 10https://gerrit.wikimedia.org/r/291038 (https://phabricator.wikimedia.org/T136347) 
[20:30:25] <grrrit-wm>	 (03PS5) 10Yuvipanda: tools: Use only one uwsgi worker for toolschecker [puppet] - 10https://gerrit.wikimedia.org/r/291038 (https://phabricator.wikimedia.org/T136347) 
[20:31:38] <grrrit-wm>	 (03CR) 10Yuvipanda: [C: 032] tools: Use only one uwsgi worker for toolschecker [puppet] - 10https://gerrit.wikimedia.org/r/291038 (https://phabricator.wikimedia.org/T136347) (owner: 10Yuvipanda)
[20:33:24] <grrrit-wm>	 (03PS1) 10Ottomata: Include cloudera reprepro updates in jessie-wikimedia [puppet] - 10https://gerrit.wikimedia.org/r/291043 (https://phabricator.wikimedia.org/T131974) 
[20:34:13] <wikibugs>	 06Operations, 10Traffic, 13Patch-For-Review: Raise cache frontend memory sizes significantly - https://phabricator.wikimedia.org/T135384#2331912 (10BBlack) So far cp3048 seems slightly better off with the new params at 156G virt + 79G resident, but it will take days to level into a new normal (not recorded a...
[20:36:03] <grrrit-wm>	 (03CR) 10Ottomata: [C: 032] Include cloudera reprepro updates in jessie-wikimedia [puppet] - 10https://gerrit.wikimedia.org/r/291043 (https://phabricator.wikimedia.org/T131974) (owner: 10Ottomata)
[20:38:04] <YuviPanda>	 ottomata: I merged youto
[20:38:05] <logmsgbot>	 !log twentyafterfour@tin Synchronized php-1.28.0-wmf.3/includes/specials/SpecialSearch.php: deploy hotfix for itwiki search T136356 (duration: 00m 23s)
[20:38:06] <stashbot>	 T136356: itwiki full text search: Unknown namespace ID: 108 - https://phabricator.wikimedia.org/T136356
[20:38:11] <ottomata>	 danke YuviPanda
[20:38:12] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[20:40:29] <mutante>	 "Warning: tag is a metaparam; this value will inherit to all contained resources in the toollabs::kubebuilder definition
[20:40:45] <mutante>	 i dunno what that is about yet
[20:41:06] <grrrit-wm>	 (03PS1) 10BBlack: VCL: lower TTL caps from 14 to 7 days [puppet] - 10https://gerrit.wikimedia.org/r/291059 (https://phabricator.wikimedia.org/T124954) 
[20:41:30] <grrrit-wm>	 (03PS2) 10BBlack: VCL: lower TTL caps from 14 to 7 days [puppet] - 10https://gerrit.wikimedia.org/r/291059 (https://phabricator.wikimedia.org/T124954) 
[20:41:45] <grrrit-wm>	 (03PS14) 10Ottomata: Druid module and analytics_cluster role class [puppet] - 10https://gerrit.wikimedia.org/r/288099 (https://phabricator.wikimedia.org/T131974) 
[20:42:00] <grrrit-wm>	 (03CR) 10BBlack: [C: 032 V: 032] VCL: lower TTL caps from 14 to 7 days [puppet] - 10https://gerrit.wikimedia.org/r/291059 (https://phabricator.wikimedia.org/T124954) (owner: 10BBlack)
[20:43:22] <grrrit-wm>	 (03PS3) 10Dzahn: udp2log: mv rolematcher.py PacketLossLogtailer.py to module [puppet] - 10https://gerrit.wikimedia.org/r/290872 
[20:44:35] <grrrit-wm>	 (03Abandoned) 10Yuvipanda: Add dh-python to build dependencies [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/290611 (owner: 10Yuvipanda)
[20:47:48] <grrrit-wm>	 (03PS15) 10Ottomata: Druid module and analytics_cluster role class [puppet] - 10https://gerrit.wikimedia.org/r/288099 (https://phabricator.wikimedia.org/T131974) 
[20:47:50] <grrrit-wm>	 (03CR) 10Lokal Profil: "Just checking that the ping wasn't meant for me." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/288582 (https://phabricator.wikimedia.org/T135212) (owner: 10Lokal Profil)
[20:48:18] <grrrit-wm>	 (03PS16) 10Ottomata: Druid module and analytics_cluster role class [puppet] - 10https://gerrit.wikimedia.org/r/288099 (https://phabricator.wikimedia.org/T131974) 
[20:48:39] <mutante>	 so jenkins-bot did it again, fail on pplint-HEAD, rebase, fixed 
[20:49:06] <grrrit-wm>	 (03CR) 10Ottomata: [C: 032 V: 032] "Not yet applied anywhere, so safe to merge." [puppet] - 10https://gerrit.wikimedia.org/r/288099 (https://phabricator.wikimedia.org/T131974) (owner: 10Ottomata)
[20:50:42] <legoktm>	 twentyafterfour: https://gerrit.wikimedia.org/r/#/c/291099/ is the proper fix for the search and watchlist errors, I can deploy it once it merges
[20:51:03] <grrrit-wm>	 (03CR) 10Yuvipanda: [C: 031] "I've fixed the underlying tests for this now, and it works fine." [puppet] - 10https://gerrit.wikimedia.org/r/290681 (https://phabricator.wikimedia.org/T136162) (owner: 10Rush)
[20:54:08] <grrrit-wm>	 (03PS4) 10Dzahn: udp2log: mv rolematcher.py PacketLossLogtailer.py to module [puppet] - 10https://gerrit.wikimedia.org/r/290872 
[20:54:40] <grrrit-wm>	 (03CR) 10Dzahn: [C: 032] "no-op http://puppet-compiler.wmflabs.org/2933/" [puppet] - 10https://gerrit.wikimedia.org/r/290872 (owner: 10Dzahn)
[21:01:53] <twentyafterfour>	 legoktm: cool
[21:03:42] <wikibugs>	 06Operations: Apt mirror for Ubuntu Trusty hash sum mismatch - https://phabricator.wikimedia.org/T136307#2332011 (10hashar)
[21:12:20] <mutante>	 !log running update-ubuntu-mirror on carbon to check for T136307
[21:12:21] <stashbot>	 T136307: Apt mirror for Ubuntu Trusty hash sum mismatch - https://phabricator.wikimedia.org/T136307
[21:12:27] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[21:13:14] <wikibugs>	 06Operations: Apt mirror for Ubuntu Trusty hash sum mismatch - https://phabricator.wikimedia.org/T136307#2332025 (10hashar) 05Open>03Resolved a:03hashar Transient issue. it is gone now :)
[21:13:59] <wikibugs>	 06Operations: Apt mirror for Ubuntu Trusty hash sum mismatch - https://phabricator.wikimedia.org/T136307#2330101 (10Dzahn) The cron entry is there, can see in syslog that it runs ... no errors in error.log  now running it manually  ..
[21:19:19] <grrrit-wm>	 (03PS2) 10Dzahn: move/copy ubuntu-cloud.key into openstack/swift modules [puppet] - 10https://gerrit.wikimedia.org/r/290874 
[21:20:18] <icinga-wm>	 RECOVERY - cassandra-b CQL 10.192.48.47:9042 on restbase2005 is OK: TCP OK - 0.038 second response time on port 9042
[21:21:32] <wikibugs>	 06Operations: Apt mirror for Ubuntu Trusty hash sum mismatch - https://phabricator.wikimedia.org/T136307#2332055 (10Dzahn) @hashar sum mismatch might be when the webproxy failed temp i guess.  I just synced it manually and it finished without problems. That doesnt change the status reported on launchpad.net thou...
[21:24:14] <grrrit-wm>	 (03PS3) 10Dzahn: move/copy ubuntu-cloud.key into openstack/swift modules [puppet] - 10https://gerrit.wikimedia.org/r/290874 
[21:25:00] <grrrit-wm>	 (03PS2) 10Dzahn: varnish: move errorpage.html from misc to module [puppet] - 10https://gerrit.wikimedia.org/r/290876 
[21:36:58] <grrrit-wm>	 (03Abandoned) 10EBernhardson: Change elasticsearch disk critical from 15% to 13% [puppet] - 10https://gerrit.wikimedia.org/r/290481 (owner: 10EBernhardson)
[21:38:29] <legoktm>	 jouncebot: next
[21:38:29] <jouncebot>	 In 1 hour(s) and 21 minute(s): Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160526T2300)
[21:42:00] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] varnish: move errorpage.html from misc to module [puppet] - 10https://gerrit.wikimedia.org/r/290876 (owner: 10Dzahn)
[21:43:48] <logmsgbot>	 !log legoktm@tin Synchronized php-1.28.0-wmf.3/includes/title/MediaWikiTitleCodec.php: TitleParser: In formatTitle(), don't throw exceptions on bad namespaces - T136352, T136356 (duration: 00m 28s)
[21:43:49] <stashbot>	 T136352: Special:EditWatchlist is broken - https://phabricator.wikimedia.org/T136352
[21:43:49] <stashbot>	 T136356: itwiki full text search: Unknown namespace ID: 108 - https://phabricator.wikimedia.org/T136356
[21:43:54] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[21:49:02] <logmsgbot>	 !log legoktm@tin Synchronized php-1.28.0-wmf.3/includes/api/ApiStashEdit.php: Bail out in ApiStashEdit for bots for sanity (duration: 00m 25s)
[21:49:09] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[21:50:51] <logmsgbot>	 !log legoktm@tin Synchronized php-1.28.0-wmf.2/includes/api/ApiStashEdit.php: Bail out in ApiStashEdit for bots for sanity (duration: 00m 24s)
[21:50:58] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[21:54:14] <grrrit-wm>	 (03PS2) 10Dzahn: nagios: move check_command/config to own file [puppet] - 10https://gerrit.wikimedia.org/r/290877 
[21:54:26] <grrrit-wm>	 (03CR) 10Dzahn: [C: 032] "no-op http://puppet-compiler.wmflabs.org/2934/" [puppet] - 10https://gerrit.wikimedia.org/r/290877 (owner: 10Dzahn)
[21:59:45] <logmsgbot>	 !log ori@tin Synchronized php-1.28.0-wmf.3/includes/api/ApiStashEdit.php: 8521b7b069: Send edit stash metrics for cache attempts (duration: 00m 25s)
[21:59:52] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[22:03:22] <grrrit-wm>	 (03PS1) 10Ottomata: Apply druid roles in production with initial (guesswork) configuration [puppet] - 10https://gerrit.wikimedia.org/r/291113 (https://phabricator.wikimedia.org/T131974) 
[22:05:50] <grrrit-wm>	 (03PS2) 10Ottomata: Apply druid roles in production with initial (guesswork) configuration [puppet] - 10https://gerrit.wikimedia.org/r/291113 (https://phabricator.wikimedia.org/T131974) 
[22:08:33] <grrrit-wm>	 (03PS3) 10Ottomata: Apply druid roles in production with initial (guesswork) configuration [puppet] - 10https://gerrit.wikimedia.org/r/291113 (https://phabricator.wikimedia.org/T131974) 
[22:08:46] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Apply druid roles in production with initial (guesswork) configuration [puppet] - 10https://gerrit.wikimedia.org/r/291113 (https://phabricator.wikimedia.org/T131974) (owner: 10Ottomata)
[22:09:40] <grrrit-wm>	 (03PS4) 10Ottomata: Apply druid roles in production with initial (guesswork) configuration [puppet] - 10https://gerrit.wikimedia.org/r/291113 (https://phabricator.wikimedia.org/T131974) 
[22:11:34] <wikibugs>	 06Operations, 10MediaWiki-Categories, 07HHVM: Broken sorting and multi-page categories for Cyrillic wikis - https://phabricator.wikimedia.org/T136281#2332207 (10Joe)
[22:13:08] <grrrit-wm>	 (03CR) 10Ottomata: [C: 032] Apply druid roles in production with initial (guesswork) configuration [puppet] - 10https://gerrit.wikimedia.org/r/291113 (https://phabricator.wikimedia.org/T131974) (owner: 10Ottomata)
[22:14:56] <wikibugs>	 06Operations, 07HHVM, 13Patch-For-Review, 07User-notice: Switch HAT appservers to trusty's ICU (or newer) - https://phabricator.wikimedia.org/T86096#2332243 (10Joe) s7 has been completed at 21.50 - as expected, being the smallest sized shard.  It went on at a decent speed of ~ 1.6 M records/hour, so we're...
[22:15:42] <wikibugs>	 06Operations, 06Discovery, 10Maps, 10Tilerator, and 2 others: allow maps cluster Varnish cache purging - https://phabricator.wikimedia.org/T112836#2332246 (10Yurik)
[22:15:44] <wikibugs>	 06Operations, 06Discovery, 10Maps, 10Tilerator, and 2 others: Tilerator should purge Varnish cache - https://phabricator.wikimedia.org/T109776#2332245 (10Yurik)
[22:16:47] <grrrit-wm>	 (03PS1) 10Hashar: (DO NOT SUBMIT) chromium on hold, drop ensure => latest [puppet] - 10https://gerrit.wikimedia.org/r/291116 (https://phabricator.wikimedia.org/T136188) 
[22:17:20] <grrrit-wm>	 (03CR) 10Hashar: [C: 04-1 V: 04-1] (DO NOT SUBMIT) chromium on hold, drop ensure => latest [puppet] - 10https://gerrit.wikimedia.org/r/291116 (https://phabricator.wikimedia.org/T136188) (owner: 10Hashar)
[22:18:09] <wikibugs>	 06Operations, 06Discovery, 10Maps, 10Tilerator, and 2 others: Tilerator should purge Varnish cache - https://phabricator.wikimedia.org/T109776#2332266 (10Yurik)
[22:18:11] <wikibugs>	 06Operations, 06Discovery, 10Maps, 10Tilerator, and 2 others: allow maps cluster Varnish cache purging - https://phabricator.wikimedia.org/T112836#2332265 (10Yurik) 05Open>03Resolved
[22:18:28] <icinga-wm>	 PROBLEM - puppet last run on druid1001 is CRITICAL: CRITICAL: puppet fail
[22:19:12] <grrrit-wm>	 (03PS1) 10Eevans: enable instance restbase2007-c [puppet] - 10https://gerrit.wikimedia.org/r/291117 (https://phabricator.wikimedia.org/T134016) 
[22:20:10] <urandom>	 mutante: can you hook me up with a merge on https://gerrit.wikimedia.org/r/#/c/291117 ?
[22:21:03] <grrrit-wm>	 (03CR) 10Dzahn: [C: 032] "restbase2007-c.codfw.wmnet has address 10.192.16.178" [puppet] - 10https://gerrit.wikimedia.org/r/291117 (https://phabricator.wikimedia.org/T134016) (owner: 10Eevans)
[22:22:15] <mutante>	 urandom:yep,  now it's active
[22:22:22] <urandom>	 mutante: thank you sir!
[22:22:30] <mutante>	 yw
[22:24:04] <grrrit-wm>	 (03PS1) 10Ottomata: Add union function from stdlib upstream [puppet] - 10https://gerrit.wikimedia.org/r/291119 
[22:24:29] <grrrit-wm>	 (03PS2) 10Ottomata: Add union function from stdlib upstream [puppet] - 10https://gerrit.wikimedia.org/r/291119 
[22:24:39] <grrrit-wm>	 (03CR) 10Ottomata: [C: 032 V: 032] Add union function from stdlib upstream [puppet] - 10https://gerrit.wikimedia.org/r/291119 (owner: 10Ottomata)
[22:26:40] <wikibugs>	 06Operations, 06Discovery, 06Discovery-Search-Backlog, 03Discovery-Search-Sprint, 07Elasticsearch: Restart elasticsearch clusters for Java update - https://phabricator.wikimedia.org/T135499#2332286 (10Deskana) 05Open>03Resolved p:05Triage>03Normal We did this!
[22:28:39] <grrrit-wm>	 (03PS1) 10Ottomata: Set empty properties hash in druid/coordinator.yaml [puppet] - 10https://gerrit.wikimedia.org/r/291121 
[22:28:51] <grrrit-wm>	 (03PS1) 10Dzahn: add "lint:ignore"s for several "puppet URL without modules" [puppet] - 10https://gerrit.wikimedia.org/r/291122 
[22:30:14] <urandom>	 !log Bootstrapping restbase2007-c.codfw.wmnet : T134016
[22:30:15] <stashbot>	 T134016: RESTBase Cassandra cluster: Increase instance count to 3 - https://phabricator.wikimedia.org/T134016
[22:30:22] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[22:30:53] <grrrit-wm>	 (03CR) 10Ottomata: [C: 032] Set empty properties hash in druid/coordinator.yaml [puppet] - 10https://gerrit.wikimedia.org/r/291121 (owner: 10Ottomata)
[22:33:14] <grrrit-wm>	 (03PS1) 10Greg Grossmeier: Remove Nik Everett's production access [puppet] - 10https://gerrit.wikimedia.org/r/291125 (https://phabricator.wikimedia.org/T130113) 
[22:33:40] <mutante>	 from nagios docs "You can have Nagios notify you of problems and recoveries pretty much anyway you want: pager, cellphone, email, instant message, audio alert, electric shocker, etc. "
[22:33:51] <wikibugs>	 06Operations, 10Ops-Access-Requests: Requesting access to stat1002 for Pcoombe - https://phabricator.wikimedia.org/T136343#2332316 (10RobH) a:03Pcoombe As this is simply expanding Peter's access, he already has a shell account setup/live.  Additionally, he has already signed the L3 document.    @pcoombe:...
[22:34:26] <grrrit-wm>	 (03PS1) 10Ottomata: Use quotes in some druid yaml values [puppet] - 10https://gerrit.wikimedia.org/r/291126 
[22:35:34] <wikibugs>	 06Operations, 10Ops-Access-Requests: Requesting access to stat1002 for Pcoombe - https://phabricator.wikimedia.org/T136343#2331283 (10Ottomata) If you are looking for files just hosted on disk on stat1002, then you want `statistics-privatedata-users`.  I think this is probably what you need.  If this data is i...
[22:36:11] <wikibugs>	 06Operations, 10Monitoring, 07Icinga: re-create script for manual paging - https://phabricator.wikimedia.org/T82937#2332342 (10Dzahn) https://old.nagios.org/developerinfo/externalcommands/commandinfo.php?command_id=134
[22:36:59] <grrrit-wm>	 (03CR) 10Ottomata: [C: 032 V: 032] Use quotes in some druid yaml values [puppet] - 10https://gerrit.wikimedia.org/r/291126 (owner: 10Ottomata)
[22:37:00] <greg-g>	 stupid gerrit mangling task urls
[22:37:40] <icinga-wm>	 PROBLEM - cassandra-c CQL 10.192.16.178:9042 on restbase2007 is CRITICAL: Connection refused
[22:37:50] <grrrit-wm>	 (03CR) 10Alex Monk: "The email was about deployment access but this removes elasticsearch+logstash root access as well?" [puppet] - 10https://gerrit.wikimedia.org/r/291125 (https://phabricator.wikimedia.org/T130113) (owner: 10Greg Grossmeier)
[22:39:34] <grrrit-wm>	 (03PS1) 10Ottomata: Remove extraneous quote in druid/middlemanager.yaml [puppet] - 10https://gerrit.wikimedia.org/r/291127 
[22:39:50] <grrrit-wm>	 (03CR) 10Ottomata: [C: 032 V: 032] Remove extraneous quote in druid/middlemanager.yaml [puppet] - 10https://gerrit.wikimedia.org/r/291127 (owner: 10Ottomata)
[22:41:51] <grrrit-wm>	 (03CR) 10Greg Grossmeier: "Production deployment access includes elastic and logstash." [puppet] - 10https://gerrit.wikimedia.org/r/291125 (https://phabricator.wikimedia.org/T130113) (owner: 10Greg Grossmeier)
[22:42:10] <grrrit-wm>	 (03PS1) 10Ottomata: Add analytics_cluster::hadoop::client to druid workers so CDH is installed [puppet] - 10https://gerrit.wikimedia.org/r/291128 (https://phabricator.wikimedia.org/T131974) 
[22:42:45] <grrrit-wm>	 (03PS2) 10Ottomata: Add analytics_cluster::hadoop::client to druid workers so CDH is installed [puppet] - 10https://gerrit.wikimedia.org/r/291128 (https://phabricator.wikimedia.org/T131974) 
[22:43:06] <grrrit-wm>	 (03CR) 10Ottomata: [C: 032 V: 032] Add analytics_cluster::hadoop::client to druid workers so CDH is installed [puppet] - 10https://gerrit.wikimedia.org/r/291128 (https://phabricator.wikimedia.org/T131974) (owner: 10Ottomata)
[22:45:26] <wikibugs>	 06Operations, 06Discovery, 10Maps, 10Tilerator, 10Traffic: Tilerator should purge Varnish cache - https://phabricator.wikimedia.org/T109776#2332371 (10Yurik)
[22:47:54] <grrrit-wm>	 (03PS1) 10Ottomata: Install the druid service package for each service [puppet] - 10https://gerrit.wikimedia.org/r/291129 (https://phabricator.wikimedia.org/T131974) 
[22:48:29] <grrrit-wm>	 (03CR) 10Ottomata: [C: 032 V: 032] Install the druid service package for each service [puppet] - 10https://gerrit.wikimedia.org/r/291129 (https://phabricator.wikimedia.org/T131974) (owner: 10Ottomata)
[22:49:49] <icinga-wm>	 RECOVERY - puppet last run on druid1001 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures
[22:50:02] <icinga-wm>	 ACKNOWLEDGEMENT - cassandra-c CQL 10.192.16.178:9042 on restbase2007 is CRITICAL: Connection refused eevans Node is bootstrapping. - The acknowledgement expires at: 2016-05-28 22:49:44.
[22:52:20] <grrrit-wm>	 (03PS8) 10Faidon Liambotis: raid: add a new "raid" fact [puppet] - 10https://gerrit.wikimedia.org/r/290988 (https://phabricator.wikimedia.org/T84050) 
[22:52:22] <grrrit-wm>	 (03PS2) 10Faidon Liambotis: raid: add monitoring for HP controllers [puppet] - 10https://gerrit.wikimedia.org/r/291014 (https://phabricator.wikimedia.org/T97998) 
[22:52:24] <grrrit-wm>	 (03PS2) 10Faidon Liambotis: raid: move check-raid.py into /usr/local/lib/nagios/plugins [puppet] - 10https://gerrit.wikimedia.org/r/291012 
[22:52:26] <grrrit-wm>	 (03PS2) 10Faidon Liambotis: raid: setup multiple checks, one per each RAID found [puppet] - 10https://gerrit.wikimedia.org/r/291013 (https://phabricator.wikimedia.org/T84050) 
[22:52:28] <grrrit-wm>	 (03PS2) 10Faidon Liambotis: raid: slightly change check-raid's "utility" names [puppet] - 10https://gerrit.wikimedia.org/r/291011 
[22:52:31] <grrrit-wm>	 (03PS3) 10Faidon Liambotis: raid: vary package installation on the RAID installed [puppet] - 10https://gerrit.wikimedia.org/r/290999 (https://phabricator.wikimedia.org/T84050) 
[22:53:10] <grrrit-wm>	 (03Abandoned) 10Faidon Liambotis: raid: add HP's RAID tool to the list [puppet] - 10https://gerrit.wikimedia.org/r/290987 (https://phabricator.wikimedia.org/T97998) (owner: 10Faidon Liambotis)
[22:53:16] <grrrit-wm>	 (03PS1) 10Ottomata: s/etc/druid/middleManager/etc/druid/middlemanager/ in druid-middlemanager.dirs [debs/druid] - 10https://gerrit.wikimedia.org/r/291130 
[22:53:51] <grrrit-wm>	 (03PS2) 10Ottomata: s/etc/druid/middleManager/etc/druid/middlemanager/ in druid-middlemanager.dirs [debs/druid] - 10https://gerrit.wikimedia.org/r/291130 
[22:53:56] <grrrit-wm>	 (03CR) 10Faidon Liambotis: [C: 04-2] "Looks fine, but see https://gerrit.wikimedia.org/r/#/c/291014/ (and its ancestors) instead, or IOW, https://gerrit.wikimedia.org/r/#/q/top" [puppet] - 10https://gerrit.wikimedia.org/r/290717 (https://phabricator.wikimedia.org/T97998) (owner: 10Volans)
[23:00:04] <jouncebot>	 RoanKattouw ostriches Krenair MaxSem awight Dereckson: Dear anthropoid, the time has come. Please deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160526T2300).
[23:00:04] <jouncebot>	 dapatrick Krenair ejegg: A patch you scheduled for Evening SWAT (Max 8 patches) is about to be deployed. Please be available during the process.
[23:00:04] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] raid: slightly change check-raid's "utility" names [puppet] - 10https://gerrit.wikimedia.org/r/291011 (owner: 10Faidon Liambotis)
[23:00:10] <Krenair>	 hey
[23:00:41] <Krenair>	 dapatrick, want to do your deploy?
[23:00:50] <Krenair>	 You should have the rights now
[23:02:09] <dapatrick>	 Krenair Uh, I'm not certain that I know how to do that.
[23:02:19] <grrrit-wm>	 (03PS3) 10Faidon Liambotis: raid: add monitoring for HP controllers [puppet] - 10https://gerrit.wikimedia.org/r/291014 (https://phabricator.wikimedia.org/T97998) 
[23:02:21] <grrrit-wm>	 (03PS3) 10Faidon Liambotis: raid: move check-raid.py into /usr/local/lib/nagios/plugins [puppet] - 10https://gerrit.wikimedia.org/r/291012 
[23:02:23] <grrrit-wm>	 (03PS3) 10Faidon Liambotis: raid: setup multiple checks, one per each RAID found [puppet] - 10https://gerrit.wikimedia.org/r/291013 (https://phabricator.wikimedia.org/T84050) 
[23:02:25] <grrrit-wm>	 (03PS3) 10Faidon Liambotis: raid: slightly change check-raid's "utility" names [puppet] - 10https://gerrit.wikimedia.org/r/291011 
[23:02:38] <Krenair>	 dapatrick, okay, I'll do it this time
[23:02:43] <dapatrick>	 csteipp Okay, thank you.
[23:02:52] <grrrit-wm>	 (03PS1) 10Ottomata: Add temporary debug message to puppet for union [puppet] - 10https://gerrit.wikimedia.org/r/291133 
[23:03:23] <Krenair>	 I'm not csteipp 
[23:03:49] <grrrit-wm>	 (03CR) 10Ottomata: [C: 032 V: 032] Add temporary debug message to puppet for union [puppet] - 10https://gerrit.wikimedia.org/r/291133 (owner: 10Ottomata)
[23:03:58] <dapatrick>	 Krenair Whoops. Sorry, I was just about to send a message to csteipp. 
[23:04:03] <dapatrick>	 Krenair Thank you. :)
[23:05:03] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] raid: slightly change check-raid's "utility" names [puppet] - 10https://gerrit.wikimedia.org/r/291011 (owner: 10Faidon Liambotis)
[23:05:47] <icinga-wm>	 PROBLEM - Druid middlemanager on druid1001 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args io.druid.cli.Main server middlemanager
[23:05:54] <ottomata>	 haha
[23:05:55] <ottomata>	 alarms!
[23:05:56] <ottomata>	 amazing!
[23:05:57] <ottomata>	 shhh
[23:05:58] <wikibugs>	 06Operations, 10cassandra: change graphite aggregation function for cassandra 'count' metrics - https://phabricator.wikimedia.org/T121789#1888407 (10GWicke) If this is really a gauge, should the cassandra metric reporter perhaps report it as such?
[23:06:00] <grrrit-wm>	 (03PS1) 10Ottomata: More temporary debug info [puppet] - 10https://gerrit.wikimedia.org/r/291134 
[23:06:08] <icinga-wm>	 PROBLEM - Druid overlord on druid1001 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args io.druid.cli.Main server overlord
[23:07:12] <grrrit-wm>	 (03CR) 10Ottomata: [C: 032 V: 032] More temporary debug info [puppet] - 10https://gerrit.wikimedia.org/r/291134 (owner: 10Ottomata)
[23:08:20] <grrrit-wm>	 (03PS1) 10Ottomata: Make debug notify unique [puppet] - 10https://gerrit.wikimedia.org/r/291135 
[23:08:45] <grrrit-wm>	 (03CR) 10Ottomata: [C: 032 V: 032] Make debug notify unique [puppet] - 10https://gerrit.wikimedia.org/r/291135 (owner: 10Ottomata)
[23:08:57] <icinga-wm>	 PROBLEM - Druid broker on druid1001 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args io.druid.cli.Main server broker
[23:09:06] <ottomata>	 i acked that hm
[23:09:17] <icinga-wm>	 PROBLEM - Druid coordinator on druid1001 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args io.druid.cli.Main server coordinator
[23:09:18] <ottomata>	 op, now i did
[23:09:22] <icinga-wm>	 ACKNOWLEDGEMENT - Druid broker on druid1001 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args io.druid.cli.Main server broker ottomata initial install
[23:09:22] <icinga-wm>	 ACKNOWLEDGEMENT - Druid coordinator on druid1001 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args io.druid.cli.Main server coordinator ottomata initial install
[23:09:22] <icinga-wm>	 ACKNOWLEDGEMENT - Druid historical on druid1001 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args io.druid.cli.Main server historical ottomata initial install
[23:09:22] <icinga-wm>	 ACKNOWLEDGEMENT - Druid middlemanager on druid1001 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args io.druid.cli.Main server middlemanager ottomata initial install
[23:09:22] <icinga-wm>	 ACKNOWLEDGEMENT - Druid overlord on druid1001 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args io.druid.cli.Main server overlord ottomata initial install
[23:10:13] <grrrit-wm>	 (03CR) 10Faidon Liambotis: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/291011 (owner: 10Faidon Liambotis)
[23:10:52] <ottomata>	 unnghhh apparently ruby renders arrays as strings differently in labs than in prod
[23:11:04] <ottomata>	 in puppet at least?
[23:11:53] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] raid: vary package installation on the RAID installed [puppet] - 10https://gerrit.wikimedia.org/r/290999 (https://phabricator.wikimedia.org/T84050) (owner: 10Faidon Liambotis)
[23:14:18] <logmsgbot>	 !log krenair@tin Synchronized php-1.28.0-wmf.3/extensions/OATHAuth/special/SpecialOATHEnable.php: https://gerrit.wikimedia.org/r/#/c/291007/ (duration: 00m 39s)
[23:14:22] <Krenair>	 dapatrick, ^
[23:14:25] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[23:14:46] <dapatrick>	 Krenair, Swell, thank again!
[23:14:50] <Krenair>	 does it work?
[23:15:09] <dapatrick>	 Verifying now.
[23:17:47] <dapatrick>	 Krenair, Yep, it works.
[23:18:16] <grrrit-wm>	 (03PS1) 10Ottomata: Use ruby json lib to render Arrays as strings in druid runtime.properties.erb [puppet] - 10https://gerrit.wikimedia.org/r/291137 (https://phabricator.wikimedia.org/T131974) 
[23:19:26] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] raid: move check-raid.py into /usr/local/lib/nagios/plugins [puppet] - 10https://gerrit.wikimedia.org/r/291012 (owner: 10Faidon Liambotis)
[23:19:46] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] raid: setup multiple checks, one per each RAID found [puppet] - 10https://gerrit.wikimedia.org/r/291013 (https://phabricator.wikimedia.org/T84050) (owner: 10Faidon Liambotis)
[23:20:37] <dapatrick>	 Krenair You basically followed the steps at https://wikitech.wikimedia.org/wiki/How_to_deploy_code#Case_1b:_extension.2Fskin.2Fvendor_changes, correct?
[23:21:15] <grrrit-wm>	 (03CR) 10Ottomata: [C: 032] Use ruby json lib to render Arrays as strings in druid runtime.properties.erb [puppet] - 10https://gerrit.wikimedia.org/r/291137 (https://phabricator.wikimedia.org/T131974) (owner: 10Ottomata)
[23:22:28] <icinga-wm>	 RECOVERY - Druid broker on druid1001 is OK: PROCS OK: 1 process with command name java, args io.druid.cli.Main server broker
[23:22:44] <Krenair>	 that's  the page yep
[23:22:48] <icinga-wm>	 RECOVERY - Druid coordinator on druid1001 is OK: PROCS OK: 1 process with command name java, args io.druid.cli.Main server coordinator
[23:23:37] <icinga-wm>	 RECOVERY - Druid overlord on druid1001 is OK: PROCS OK: 1 process with command name java, args io.druid.cli.Main server overlord
[23:24:43] <dapatrick>	 Krenair, Okay, I have not done that for extensions, but I've watched and taken notes when Chris was deploying to core, and yesterday I deployed a config change. It seems pretty similar.
[23:24:56] <Krenair>	 it's similar yes
[23:24:58] <dapatrick>	 Krenair, but I'm guess in this case you were able to just update the submodule, correcdt?
[23:25:06] <Krenair>	 but there are also important differences
[23:25:11] <Krenair>	 Gerrit updates the submodule for 99% of extensions
[23:25:26] <Krenair>	 well, you have to submodule update on tin still of  course
[23:25:57] <dapatrick>	 I mean because there were no "SECURITY:" patches in the log.
[23:26:19] <Krenair>	 We can't discuss that here.
[23:26:37] <dapatrick>	 Got it.
[23:28:10] <ejegg>	 Krenair: sorry i'm late to the party.  CentralNotice patch is not yet deployed, correct?
[23:28:21] <Krenair>	 correct
[23:28:27] <icinga-wm>	 PROBLEM - Druid broker on druid1001 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args io.druid.cli.Main server broker
[23:28:36] <ejegg>	 cool, i'm available to test whenever that goes out
[23:28:37] <Krenair>	 I was waiting  on jenkins and then got distracted and didn't notice it complete
[23:28:47] <ejegg>	 word, no rush
[23:32:44] <grrrit-wm>	 (03PS1) 10BryanDavis: Add pep8 environment to tox.ini for jenkins job [puppet] - 10https://gerrit.wikimedia.org/r/291138 
[23:33:44] <paladox>	 Dereckson: Hi could you approve the translations at https://www.mediawiki.org/wiki/Template:WikimediaDownload please.
[23:33:46] <logmsgbot>	 !log krenair@tin Synchronized php-1.28.0-wmf.3/extensions/Math/modules/ve-math/ve.ui.MWMathContextItem.js: https://gerrit.wikimedia.org/r/#/c/290971/ (duration: 00m 28s)
[23:33:52] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[23:35:43] <Krenair>	 hm, didn't seem to take effecr
[23:35:45] <Krenair>	 effect*
[23:35:57] * Krenair blames RL caching
[23:36:01] <logmsgbot>	 !log krenair@tin Synchronized php-1.28.0-wmf.3/extensions/Math/modules/ve-math/ve.ui.MWMathContextItem.js: touch (duration: 00m 27s)
[23:36:06] <bd808>	 paravoid: https://gerrit.wikimedia.org/r/#/c/291138/ should fix the pep8 jobs failures from misconfiguration. Now it lists the bazillion pep8 violations in the repo
[23:36:08] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[23:37:38] <Krenair>	 that worked
[23:39:07] <Dereckson>	 paladox: ask translation admin rights?
[23:39:19] <paladox>	 OK
[23:40:17] <logmsgbot>	 !log krenair@tin Synchronized php-1.28.0-wmf.3/extensions/VisualEditor/modules/ve-mw/dm/models/ve.dm.MWTransclusionModel.js: https://gerrit.wikimedia.org/r/#/c/290994/ (duration: 00m 25s)
[23:40:23] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[23:41:31] <icinga-wm>	 PROBLEM - puppet last run on mw2128 is CRITICAL: CRITICAL: Puppet has 1 failures
[23:41:53] <Krenair>	 bah, same problem
[23:42:01] <icinga-wm>	 RECOVERY - Druid broker on druid1001 is OK: PROCS OK: 1 process with command name java, args io.druid.cli.Main server broker
[23:42:02] <Krenair>	 now it works
[23:42:14] <Krenair>	 ejegg, your turn
[23:42:35] <Krenair>	 ugh it's CentralNotice with the nonstandard deployment branches
[23:43:32] <grrrit-wm>	 (03PS1) 10Ottomata: Druid puppet improvements for prod [puppet] - 10https://gerrit.wikimedia.org/r/291140 (https://phabricator.wikimedia.org/T131974) 
[23:44:15] <Krenair>	 going through jenkins...
[23:44:25] <icinga-wm>	 CUSTOM - DPKG on planet2001 is OK: All packages OK
[23:44:44] <mutante>	 that was a test
[23:44:44] <grrrit-wm>	 (03CR) 10Ottomata: [C: 032 V: 032] Druid puppet improvements for prod [puppet] - 10https://gerrit.wikimedia.org/r/291140 (https://phabricator.wikimedia.org/T131974) (owner: 10Ottomata)
[23:46:14] <ejegg>	 Krenair: yah, gotta have the same version everywhere
[23:47:11] <icinga-wm>	 RECOVERY - Druid middlemanager on druid1001 is OK: PROCS OK: 1 process with command name java, args io.druid.cli.Main server middleManager
[23:47:22] <icinga-wm>	 CUSTOM - DPKG on planet2001 is OK: All packages OK
[23:52:09] <grrrit-wm>	 (03PS1) 10Dzahn: rcstream: let wikitech connect to redis via IPv6 [puppet] - 10https://gerrit.wikimedia.org/r/291142 
[23:53:03] <grrrit-wm>	 (03PS2) 10Alex Monk: rcstream: let wikitech connect to redis via IPv6 [puppet] - 10https://gerrit.wikimedia.org/r/291142 (https://phabricator.wikimedia.org/T136245) (owner: 10Dzahn)
[23:53:11] <grrrit-wm>	 (03PS3) 10Alex Monk: rcstream: let wikitech connect to redis via IPv6 [puppet] - 10https://gerrit.wikimedia.org/r/291142 (https://phabricator.wikimedia.org/T136245) (owner: 10Dzahn)
[23:53:20] <grrrit-wm>	 (03CR) 10Alex Monk: [C: 031] rcstream: let wikitech connect to redis via IPv6 [puppet] - 10https://gerrit.wikimedia.org/r/291142 (https://phabricator.wikimedia.org/T136245) (owner: 10Dzahn)
[23:56:54] <logmsgbot>	 !log krenair@tin Synchronized php-1.28.0-wmf.3/extensions/CentralNotice/resources/subscribing: https://gerrit.wikimedia.org/r/#/c/291120/1 (duration: 00m 24s)
[23:56:55] <Krenair>	 ejegg, ^ there you go, sorry for the wait
[23:57:01] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[23:57:09] <ejegg>	 thanks Krenair , I'll take a look!
[23:57:25] <grrrit-wm>	 (03CR) 10Dzahn: [C: 032] "http://puppet-compiler.wmflabs.org/2937/rcs1001.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/291142 (https://phabricator.wikimedia.org/T136245) (owner: 10Dzahn)
[23:59:17] <Krenair>	 I have a couple of my own patches to do
[23:59:48] <mutante>	 eh.. surprise Krenair..
[23:59:54] <mutante>	 it doesnt do what we expected