[00:00:04] <jouncebot>	 addshore, hashar, anomie, ostriches, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, and thcipriani: Respected human, time to deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20161130T0000). Please do the needful.
[00:00:04] <jouncebot>	 kaldari: A patch you scheduled for Evening SWAT (Max 8 patches) is about to be deployed. Please be available during the process.
[00:00:19] <kaldari>	 yeehaw! here we go!
[00:00:47] <godog>	 andre__: I'm going to remove HTTPS from T149977 since I don't think is relevant and adds traffic + operations, thoughts?
[00:00:48] <stashbot>	 T149977: After login, user not logged in when "prefershttps" set to false and "wgSecureLogin" set to true - https://phabricator.wikimedia.org/T149977
[00:01:09] <kaldari>	 or not
[00:01:12] <andre__>	 godog, do it? :)
[00:01:21] <kaldari>	 I guess the train deployment didn't happen?
[00:01:27] <kaldari>	 My patch depends on the train
[00:02:15] <mutante>	 jouncebot: now
[00:02:15] <jouncebot>	 For the next 0 hour(s) and 57 minute(s): Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20161130T0000)
[00:02:32] <godog>	 andre__: heh, I thought #HTTPS might have had some hidden meaning I wasn't aware of
[00:02:50] <andre__>	 godog: I won't know more than what its project description says, sorry :P
[00:03:03] <grrrit-wm>	 (03PS2) 10Dzahn: Phabricator: Don't use vcs group, use phd [puppet] - 10https://gerrit.wikimedia.org/r/323996 (https://phabricator.wikimedia.org/T146055) (owner: 1020after4) 
[00:03:13] <godog>	 hehe fair
[00:05:05] <kaldari>	 greg-g: Should I reschedule my SWAT patch for tomorrow? It depends on the train deployment.
[00:05:10] <grrrit-wm>	 (03CR) 10Dzahn: [C: 032] Phabricator: Don't use vcs group, use phd [puppet] - 10https://gerrit.wikimedia.org/r/323996 (https://phabricator.wikimedia.org/T146055) (owner: 1020after4) 
[00:09:18] <grrrit-wm>	 (03CR) 10Dzahn: "Notice: /Stage[main]/Phabricator/Phabricator::Conf_env[vcs]/File[/srv/phab/phabricator/conf/local/vcs.json]/group: group changed 'vcs' to " [puppet] - 10https://gerrit.wikimedia.org/r/323996 (https://phabricator.wikimedia.org/T146055) (owner: 1020after4) 
[00:10:19] <wikibugs>	 06Operations, 06Discovery, 06Maps: Investigate Swift as a storage backend for maps tiles - https://phabricator.wikimedia.org/T149885#2833686 (10fgiunchedi) p:05Triage>03Normal >>! In T149885#2768978, @MaxSem wrote: > Now that we know our space requirements are still low, we can investigate our options fu...
[00:10:39] <kaldari>	 _joe_: Any update on things? It's currently SWAT deployment time, but looks like the train was blocked by T151702. Should I reschedule SWAT patches for tomorrow instead?
[00:10:40] <stashbot>	 T151702: API cluster failure / OOM - https://phabricator.wikimedia.org/T151702
[00:12:37] <bd808>	 kaldari: group0 is at wmf.4
[00:12:47] <kaldari>	 oh cool
[00:13:00] <bd808>	 https://tools.wmflabs.org/versions/
[00:13:28] <bd808>	 there was a little hiccup but o.striches handled it
[00:13:59] <kaldari>	 bd808: that's a handy tool
[00:14:35] <bd808>	 it is indeed. some master craftsman must have made it ;)
[00:14:51] <wikibugs_>	 06Operations, 10Deployment-Systems, 06Release-Engineering-Team: Trebuchet targets for test/testrepo are out of date - https://phabricator.wikimedia.org/T149180#2833693 (10fgiunchedi) p:05Triage>03Low
[00:14:54] <kaldari>	 no doubt :)
[00:15:27] <bd808>	 if you click on the version numbers it will show you the wikis in that group too
[00:15:38] <wikibugs>	 06Operations, 10Wikimedia-Logstash: fix partition scheme for logstash ingester hosts - https://phabricator.wikimedia.org/T150108#2833694 (10fgiunchedi) p:05Triage>03Normal
[00:15:40] <kaldari>	 nice
[00:16:17] <kaldari>	 bd808: Where is everybody? This channel is a ghosttown today.
[00:16:56] <godog>	 nice color palette too
[00:17:30] <bd808>	 kaldari: *shrug* making charitable donations?
[00:17:38] <kaldari>	 no doubt
[00:18:40] <grrrit-wm>	 (03PS2) 10Kaldari: Test cookie blocking on Test Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/322154 (https://phabricator.wikimedia.org/T150991) 
[00:26:11] <grrrit-wm>	 (03PS1) 10Filippo Giunchedi: logstash: switch to /srv partitioning for ingester hosts [puppet] - 10https://gerrit.wikimedia.org/r/324362 (https://phabricator.wikimedia.org/T150108) 
[00:26:16] <godog>	 bd808: how much of a problem it is ATM to reimage logstash ingester hosts? for ^
[00:26:32] <godog>	 IIRC it wasn't behind pybal yet ?
[00:26:52] <bd808>	 no, and it probably won't be 
[00:27:22] <bd808>	 several of the protocols we use are udp based and use multiple packets per message
[00:27:41] <bd808>	 so all the UDP needs to go to the same host
[00:27:52] <bd808>	 for a given protocol
[00:27:58] <kaldari>	 anyone object if I deploy my config change on test Wikipedia (https://gerrit.wikimedia.org/r/#/c/322154)? Looks like no one's doing SWAT deployments today.
[00:28:16] <bd808>	 kaldari: go for it
[00:28:19] <kaldari>	 plus I haven't broken the sites in a while
[00:28:47] <grrrit-wm>	 (03CR) 10Kaldari: [C: 032] Test cookie blocking on Test Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/322154 (https://phabricator.wikimedia.org/T150991) (owner: 10Kaldari) 
[00:28:53] <bd808>	 godog: we can tweak the mediawiki config pretty easily to migrate traffic to a subset of hosts
[00:29:07] <godog>	 bd808: ah, lvs can do source hashing though for that
[00:29:20] <grrrit-wm>	 (03Merged) 10jenkins-bot: Test cookie blocking on Test Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/322154 (https://phabricator.wikimedia.org/T150991) (owner: 10Kaldari) 
[00:29:24] <bd808>	 but hhvm and other log sources are basically hard coded to particular backends
[00:29:36] <bd808>	 godog: oh? somebody should test that out then :)
[00:30:06] <kaldari>	 bd808: We're still using tin to do all the syncing from, right?
[00:30:22] <kaldari>	 just want to double check :)
[00:30:22] <bd808>	 godog: we can't play with lvs in beta cluster due to the OpenStack network layer so changes like that are blind to me
[00:30:41] <bd808>	 kaldari: yeah. and there is a new step to try things out before breaking the whole cluster
[00:30:47] <bd808>	 kaldari: let me find the link
[00:32:16] <bd808>	 kaldari: https://wikitech.wikimedia.org/wiki/SWAT_deploys#Doing_the_deploy
[00:32:30] <bd808>	 oops mw1099 needs to be changed there
[00:33:16] <bd808>	 the test host is mwdebug1002 now
[00:33:32] <kaldari>	 thanks
[00:33:56] <godog>	 bd808: indeed, we could add another service for logstash ingestion to the existing logstash.svc
[00:34:04] <bd808>	 so fetch on tin like always, scap pull on mwdebug1002, test with X-Wikimedia-Debug, scap sync-file or whatever
[00:34:51] <bd808>	 godog: if you can get things moved to have a balancer in front of the logstash service that would be awesome
[00:35:17] <bd808>	 right now there are some things pinned to each of the 3 physical hosts
[00:35:55] <bd808>	 hhvm syslog to 01, restbase to 03, *something* to 02
[00:36:11] <bd808>	 mediawiki is the only thing that spreads out over all 3
[00:36:34] <wikibugs_>	 06Operations, 10Wikimedia-Logstash: Move logstash ingestion behind LVS - https://phabricator.wikimedia.org/T151971#2833764 (10fgiunchedi)
[00:36:47] <kaldari>	 bd808: cded to /srv/mediawiki-staging, did git fetch, but git diff wmf-config shows nothing. Am I missing a step?
[00:36:50] <wikibugs>	 06Operations, 10Wikimedia-Logstash: Move logstash ingestion behind LVS - https://phabricator.wikimedia.org/T151971#2833776 (10fgiunchedi) p:05Triage>03Normal
[00:37:02] <kaldari>	 change is merged: https://gerrit.wikimedia.org/r/#/c/322154/
[00:37:40] <godog>	 bd808: yeah I can get things lined up over time but not deployed until after the holidays
[00:38:00] <bd808>	 kaldari: git log --stat HEAD..@{upstream} will show you what is fetched but not staged in the index
[00:38:23] <bd808>	 kaldari: then git rebase @{upstream} to actually apply the pending changes
[00:38:37] <bd808>	 godog: *nod*
[00:38:40] <kaldari>	 "git diff HEAD origin" show it
[00:38:43] <kaldari>	 shows it
[00:39:01] <wikibugs_>	 06Operations, 10Wikimedia-Logstash, 13Patch-For-Review: fix partition scheme for logstash ingester hosts - https://phabricator.wikimedia.org/T150108#2833778 (10fgiunchedi)
[00:39:03] <wikibugs>	 06Operations, 10Wikimedia-Logstash: Move logstash ingestion behind LVS - https://phabricator.wikimedia.org/T151971#2833777 (10fgiunchedi)
[00:39:17] <bd808>	 kaldari: yup that's another random way to look at the diff
[00:39:35] <bd808>	 so then rebase to get it applied and your ready to sync things
[00:40:06] <kaldari>	 that works
[00:40:27] <mutante>	 !log phab2001 - enabled puppet to bring it up2date with a various changes
[00:40:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:40:38] <bd808>	 godog: back to your original question, it will cause us to lose some logs but its not the end of the world
[00:41:17] <bd808>	 godog: would it be a full reimage of the hosts?
[00:41:36] * bd808 probably has things in ~ that aren't backed up
[00:41:50] <kaldari>	 bd808: permission denied for ssh mwdebug1002
[00:42:40] <bd808>	 kaldari: from your laptop?
[00:42:59] <kaldari>	 from tin
[00:43:15] <kaldari>	 I'll try from laptop
[00:43:26] <bd808>	 If you want to jump over from tin you can do `SSH_AUTH_SOCK=/run/keyholder/proxy.sock ssh mwdeploy@mwdebug1002.eqiad.wmnet`
[00:43:53] <kaldari>	 cool, thanks!
[00:44:25] <bd808>	 the prod bastions don't allow agent forwarding anymore so you have to either come in from the outside to each host or cheat and use the scap ssh agent
[00:45:28] <kaldari>	 synced on 1002, testing...
[00:46:04] <kaldari>	 bd808: oh yeah, I guess agent forwarding wasn't a good idea :)
[00:49:18] <godog>	 bd808: yeah, but not urgent at all, just OCD
[00:50:08] <icinga-wm>	 PROBLEM - HHVM processes on mw1276 is CRITICAL: PROCS CRITICAL: 0 processes with command name hhvm
[00:50:08] <icinga-wm>	 PROBLEM - HHVM rendering on mw1276 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50371 bytes in 0.002 second response time
[00:50:11] <bd808>	 the most useful thing on those boxes is my ~/.bash_history so I don't have to remember how to do things; just grep
[00:51:08] <icinga-wm>	 RECOVERY - HHVM processes on mw1276 is OK: PROCS OK: 6 processes with command name hhvm
[00:51:08] <icinga-wm>	 RECOVERY - HHVM rendering on mw1276 is OK: HTTP OK: HTTP/1.1 200 OK - 71534 bytes in 0.131 second response time
[00:52:53] <TimStarling>	 !log on mw1276: tuning jemalloc, will restart hhvm several times, running it in a terminal
[00:53:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:53:16] <logmsgbot>	 !log kaldari@tin Synchronized wmf-config/InitialiseSettings.php: sync InitialiseSettings to test cookie blocking on Test Wikipedia (duration: 00m 45s)
[00:53:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[00:57:26] <icinga-wm>	 RECOVERY - cassandra-b CQL 10.192.48.69:9042 on restbase2012 is OK: TCP OK - 0.036 second response time on 10.192.48.69 port 9042
[00:59:59] <wikibugs_>	 06Operations, 10MediaWiki-Configuration, 06Performance-Team, 06Services (watching), and 5 others: Integrating MediaWiki (and other services) with dynamic configuration - https://phabricator.wikimedia.org/T149617#2833815 (10Krinkle) >>! In T149617#2832982, @aaron wrote: > The background process would write...
[01:00:41] <grrrit-wm>	 (03PS1) 10Dzahn: phab: fix systemd unit file name of ssh-phab [puppet] - 10https://gerrit.wikimedia.org/r/324369 
[01:01:36] <icinga-wm>	 PROBLEM - PHD should be supervising processes on phab2001 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 997 (phd)
[01:02:46] <icinga-wm>	 PROBLEM - puppet last run on phab2001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 49 seconds ago with 1 failures. Failed resources (up to 3 shown): File[/etc/systemd/system/ssh-phab.service]
[01:03:24] <twentyafterfour>	 hmm why is phab2001 alerting
[01:03:56] <icinga-wm>	 PROBLEM - Check systemd state on mw1276 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[01:04:06] <mutante>	 twentyafterfour: that's me
[01:04:14] <mutante>	 twentyafterfour: i was talking in releng
[01:04:22] <grrrit-wm>	 (03CR) 1020after4: [C: 031] phab: fix systemd unit file name of ssh-phab [puppet] - 10https://gerrit.wikimedia.org/r/324369 (owner: 10Dzahn) 
[01:04:30] <mutante>	 :) yes, and that should fix it
[01:04:32] <mutante>	 thanks
[01:04:36] <twentyafterfour>	 mutante: cool
[01:04:56] <mutante>	 also, puppet ran on that and it got a bunch of other updates
[01:05:05] <mutante>	 confirmed that it didnt break git-ssh 
[01:05:08] <twentyafterfour>	 cool
[01:05:54] <twentyafterfour>	 we should be very nearly ready to make phab2001 be a real hot spare for repositories and warm backup for web
[01:06:24] <paladox>	 twentyafterfour it seems phab ssh service is failing puppet
[01:06:33] <paladox>	 Error: /Stage[main]/Phabricator::Vcs/File[/etc/systemd/system/ssh-phab.service]: Could not evaluate: Could not retrieve information from environment production source(s) puppet:///modules/phabricator/sshd-phab.service
[01:06:56] <twentyafterfour>	 paladox: I think mutante just fixed that with https://gerrit.wikimedia.org/r/#/c/324369/1
[01:07:08] <paladox>	 oh
[01:07:10] <paladox>	 thanks
[01:08:06] <grrrit-wm>	 (03CR) 10Paladox: [C: 031] "Good notice, I just noticed it just now :)" [puppet] - 10https://gerrit.wikimedia.org/r/324369 (owner: 10Dzahn) 
[01:08:38] <grrrit-wm>	 (03PS1) 10Filippo Giunchedi: lvs: add logstash [puppet] - 10https://gerrit.wikimedia.org/r/324371 (https://phabricator.wikimedia.org/T151971) 
[01:10:33] <grrrit-wm>	 (03PS2) 10Dzahn: phab: fix systemd unit file name of ssh-phab [puppet] - 10https://gerrit.wikimedia.org/r/324369 
[01:11:44] <grrrit-wm>	 (03PS3) 10Dzahn: phab: fix systemd unit file name of ssh-phab [puppet] - 10https://gerrit.wikimedia.org/r/324369 (https://phabricator.wikimedia.org/T137928) 
[01:11:50] <grrrit-wm>	 (03PS4) 10Dzahn: phab: fix systemd unit file name of ssh-phab [puppet] - 10https://gerrit.wikimedia.org/r/324369 (https://phabricator.wikimedia.org/T137928) 
[01:13:08] <grrrit-wm>	 (03CR) 10Dzahn: [C: 032] phab: fix systemd unit file name of ssh-phab [puppet] - 10https://gerrit.wikimedia.org/r/324369 (https://phabricator.wikimedia.org/T137928) (owner: 10Dzahn) 
[01:13:55] <grrrit-wm>	 (03PS1) 10Filippo Giunchedi: templates: add PTR for pdfrender [dns] - 10https://gerrit.wikimedia.org/r/324372 
[01:13:57] <grrrit-wm>	 (03PS1) 10Filippo Giunchedi: templates: add logstash.svc [dns] - 10https://gerrit.wikimedia.org/r/324373 (https://phabricator.wikimedia.org/T151971) 
[01:14:13] <wikibugs>	 06Operations, 10Parsoid, 06Release-Engineering-Team: Provide a /parsoid directory on releases.wikimedia.org - https://phabricator.wikimedia.org/T150672#2833841 (10fgiunchedi) p:05Triage>03Normal
[01:14:36] <paladox>	 mutante twentyafterfour phabricator now works on labs
[01:14:40] <paladox>	 no puppet errors
[01:14:43] <paladox>	 :)
[01:14:46] <icinga-wm>	 RECOVERY - puppet last run on phab2001 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures
[01:15:23] <wikibugs>	 06Operations: Upgrade qemu on ganeti clusters to 2.7 - https://phabricator.wikimedia.org/T150532#2833846 (10fgiunchedi) p:05Triage>03Normal
[01:15:26] <mutante>	 paladox: :) yay
[01:15:34] <wikibugs_>	 06Operations: Puppet CA rollover - https://phabricator.wikimedia.org/T150823#2833847 (10fgiunchedi) p:05Triage>03Normal
[01:15:47] <paladox>	 Yep, we can finally move away from the phabricator labs class
[01:16:29] * mutante awards a token
[01:16:50] <paladox>	 LOL, :):):):):):):)
[01:17:06] <icinga-wm>	 PROBLEM - Check systemd state on phab2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[01:17:12] <wikibugs_>	 06Operations, 06DC-Ops: Racktables equipment that should probably be renamed ? - https://phabricator.wikimedia.org/T150744#2833860 (10fgiunchedi) p:05Triage>03Normal
[01:17:27] <paladox>	 mutante twentyafterfour only thing left is to make the domain configurable in the apache file
[01:17:34] <paladox>	 but i have to go now
[01:19:13] <mutante>	 paladox: great, continue tomorrow please, thanks
[01:19:21] <mutante>	 cu later
[01:19:22] <paladox>	 Ok
[01:19:29] <paladox>	 and you too :)
[01:19:56] <icinga-wm>	 RECOVERY - Check systemd state on mw1276 is OK: OK - running: The system is fully operational
[01:20:07] <mutante>	 we can move on with rest of  T137928 now i think
[01:20:08] <stashbot>	 T137928: Deploy phabricator to phab2001.codfw.wmnet - https://phabricator.wikimedia.org/T137928
[01:20:19] <paladox>	 ok
[01:20:20] <paladox>	 :)
[01:20:22] <mutante>	 since the networking stuff is unblocked
[01:20:25] <mutante>	 or should be
[01:20:26] <paladox>	 yep
[01:28:52] <icinga-wm>	 ACKNOWLEDGEMENT - Check systemd state on phab2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. daniel_zahn https://phabricator.wikimedia.org/T137928
[01:28:52] <icinga-wm>	 ACKNOWLEDGEMENT - PHD should be supervising processes on phab2001 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 997 (phd) daniel_zahn https://phabricator.wikimedia.org/T137928
[01:55:06] <wikibugs>	 06Operations, 10Mail: Update legal-tm-vio@ alias - https://phabricator.wikimedia.org/T150463#2833914 (10fgiunchedi) 05Open>03Resolved p:05Triage>03Normal a:03fgiunchedi @Slaporte I've added both to `legal-tm-vio@` now, note that `trademark@` is a group so if recipients are in both they would get mail...
[01:56:39] <wikibugs>	 06Operations, 10puppet-compiler, 15User-Joe: puppet compiler fails with modules using puppetdb - https://phabricator.wikimedia.org/T150456#2833919 (10fgiunchedi) p:05Triage>03Normal
[01:59:20] <wikibugs_>	 06Operations, 06Parsing-Team, 06Release-Engineering-Team, 07HHVM, and 2 others: API cluster failure / OOM - https://phabricator.wikimedia.org/T151702#2833920 (10tstarling) >>! In T151702#2831448, @Joe wrote: > From a quick look, most threads seem effectively blocked in a very simple function: >  > ``` > je...
[02:07:35] <wikibugs_>	 06Operations, 10Phabricator: iridium / filesystem almost full - https://phabricator.wikimedia.org/T150396#2833935 (10fgiunchedi) p:05Triage>03Normal `/tmp` keeps getting full with temporary directories that are never cleaned up. Interestingly all files in there are either one byte or 4194304 bytes so some...
[02:07:35] <hoo>	 !log Updated Wikidata's property suggester with data from Monday's json dump and applied the T132839 workarounds
[02:07:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:07:47] <stashbot>	 T132839: [RfC] Property suggester suggests human properties for non-human items - https://phabricator.wikimedia.org/T132839
[02:08:16] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: m2 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:08:16] <icinga-wm>	 PROBLEM - MariaDB Slave IO: m3 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:08:16] <icinga-wm>	 PROBLEM - MariaDB Slave IO: s2 on dbstore1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[02:09:06] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: m2 on dbstore1001 is OK: OK slave_sql_state not a slave
[02:09:06] <icinga-wm>	 RECOVERY - MariaDB Slave IO: m3 on dbstore1001 is OK: OK slave_io_state Slave_IO_Running: Yes
[02:09:06] <icinga-wm>	 RECOVERY - MariaDB Slave IO: s2 on dbstore1001 is OK: OK slave_io_state Slave_IO_Running: Yes
[02:10:37] <wikibugs>	 06Operations, 06Discovery, 10Wikidata, 10Wikidata-Query-Service: Wikidata Query Service is overly verbose toward logstash - https://phabricator.wikimedia.org/T150356#2833943 (10fgiunchedi) p:05Triage>03Normal
[02:10:48] <wikibugs_>	 06Operations, 06Security-Team: icinga notification if elevated writing to badpass.log - https://phabricator.wikimedia.org/T150300#2833944 (10fgiunchedi) p:05Triage>03Normal
[02:10:57] <wikibugs>	 06Operations, 10Analytics: Install java 8 to stat1002 - https://phabricator.wikimedia.org/T151896#2833945 (10fgiunchedi) p:05Triage>03Normal
[02:15:04] <wikibugs_>	 06Operations, 10Analytics: Install java 8 to stat1002 - https://phabricator.wikimedia.org/T151896#2833946 (10fgiunchedi) IIRC the alternatives should already prefer java-7 if both are installed, I'm not sure to which puppet class/role to add the package though (cc @Ottomata @elukey )
[02:15:17] <wikibugs>	 06Operations, 10Electron-PDFs, 06TCB-Team, 13Patch-For-Review, 07User-notice: Deploy ElectronPdfService Extension to production - https://phabricator.wikimedia.org/T150185#2833948 (10fgiunchedi) p:05Triage>03Normal
[02:29:56] <wikibugs_>	 06Operations, 10Wikimedia-Extension-setup, 07I18n: Deploy IDS rendering engine to production - https://phabricator.wikimedia.org/T148693#2730181 (10MaxSem) Uh, this renderer has not only Chinese documentation and comments, but even identifiers are in Chinese in some places. To me, this means that (almost?) n...
[02:33:19] <wikibugs>	 06Operations: reinstall rcs100[12] with RAID - https://phabricator.wikimedia.org/T140441#2464918 (10fgiunchedi) Looks like both machine might actually have only one disk. Both machines are out of warranty since 2014, we can probably move at least one to a VM
[03:24:56] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 738.75 seconds
[03:47:56] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 292.72 seconds
[04:05:46] <icinga-wm>	 PROBLEM - puppet last run on cp1067 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[04:34:46] <icinga-wm>	 RECOVERY - puppet last run on cp1067 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[04:45:26] <icinga-wm>	 PROBLEM - puppet last run on db1033 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[04:55:06] <icinga-wm>	 PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL ERROR - Redis Library - can not ping 10.192.48.44 on port 6479
[04:56:06] <icinga-wm>	 RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS 2.8.17 on 10.192.48.44:6479 has 1 databases (db0) with 4755300 keys, up 29 days 20 hours - replication_delay is 49
[04:59:06] <icinga-wm>	 PROBLEM - puppet last run on elastic1042 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[05:13:26] <icinga-wm>	 RECOVERY - puppet last run on db1033 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures
[05:13:33] <wikibugs_>	 06Operations, 06Parsing-Team, 06Release-Engineering-Team, 07HHVM, and 2 others: API cluster failure / OOM - https://phabricator.wikimedia.org/T151702#2834167 (10tstarling) Filed upstream bug https://github.com/facebook/hhvm/issues/7515 , but we're not blocked on it, we can use the MALLOC_CONF environment v...
[05:26:06] <icinga-wm>	 RECOVERY - puppet last run on elastic1042 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures
[05:40:42] <grrrit-wm>	 (03CR) 10Krinkle: [C: 04-1] "Per IRC. docroot/mobileportal is symlinked to m.wikipedia.org and mobilelanding.php is used in various places." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323999 (owner: 10Chad) 
[05:41:33] <grrrit-wm>	 (03Abandoned) 10Krinkle: Remove bits.wikimedia.org apache config [puppet] - 10https://gerrit.wikimedia.org/r/322420 (https://phabricator.wikimedia.org/T107430) (owner: 10Alex Monk) 
[05:42:08] <grrrit-wm>	 (03PS3) 10Krinkle: Remove bits docroot [mediawiki-config] - 10https://gerrit.wikimedia.org/r/317657 (owner: 10Chad) 
[05:49:26] <icinga-wm>	 PROBLEM - mailman I/O stats on fermium is CRITICAL: CRITICAL - I/O stats: Transfers/Sec=986.80 Read Requests/Sec=356.00 Write Requests/Sec=3.50 KBytes Read/Sec=44060.00 KBytes_Written/Sec=90.40
[05:57:26] <icinga-wm>	 RECOVERY - mailman I/O stats on fermium is OK: OK - I/O stats: Transfers/Sec=10.80 Read Requests/Sec=10.60 Write Requests/Sec=225.40 KBytes Read/Sec=70.80 KBytes_Written/Sec=2885.60
[06:27:06] <icinga-wm>	 PROBLEM - puppet last run on db1046 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[06:35:06] <icinga-wm>	 PROBLEM - puppet last run on ocg1003 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[tree]
[06:37:12] <Debra>	 RIP, bits.
[06:45:26] <icinga-wm>	 RECOVERY - Disk space on labtestnet2001 is OK: DISK OK
[06:53:11] <grrrit-wm>	 (03PS1) 10Yuvipanda: statistics: use R from jessie-backports on jessie boxes [puppet] - 10https://gerrit.wikimedia.org/r/324384 
[06:56:06] <icinga-wm>	 RECOVERY - puppet last run on db1046 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures
[07:00:14] <grrrit-wm>	 (03PS2) 10Yuvipanda: statistics: use R from jessie-backports on jessie boxes [puppet] - 10https://gerrit.wikimedia.org/r/324384 
[07:02:06] <icinga-wm>	 RECOVERY - puppet last run on ocg1003 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures
[07:03:07] <grrrit-wm>	 (03CR) 10Yuvipanda: [C: 032] statistics: use R from jessie-backports on jessie boxes [puppet] - 10https://gerrit.wikimedia.org/r/324384 (owner: 10Yuvipanda) 
[07:06:14] <marostegui>	 !log Stop mysql db2048 maintenance - T149553
[07:06:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:06:26] <stashbot>	 T149553: db2034: investigate its crash and reimage - https://phabricator.wikimedia.org/T149553
[07:17:32] <marostegui>	 !log Deploy alter table dbstore1002 - dewiki.revision - T148967
[07:17:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:17:44] <stashbot>	 T148967: Fix PK on S5 dewiki.revision - https://phabricator.wikimedia.org/T148967
[08:01:45] <wikibugs>	 06Operations, 10ops-eqiad, 10hardware-requests: Return wmf4747/wmf4748/wmf4749/wmf4750 to spares - https://phabricator.wikimedia.org/T146171#2834278 (10Joe) I'm not clear if there is anything I should do about this ticket
[08:42:12] <grrrit-wm>	 (03PS1) 10Jcrespo: mariadb: change check_private_data to print DROP statements [puppet] - 10https://gerrit.wikimedia.org/r/324386 (https://phabricator.wikimedia.org/T147052) 
[08:42:28] <grrrit-wm>	 (03PS2) 10Jcrespo: mariadb: change check_private_data to print DROP statements [puppet] - 10https://gerrit.wikimedia.org/r/324386 (https://phabricator.wikimedia.org/T147052) 
[08:43:43] <grrrit-wm>	 (03CR) 10Marostegui: "nice change, a lot easier to handle the future drops directly from the output!" [puppet] - 10https://gerrit.wikimedia.org/r/324386 (https://phabricator.wikimedia.org/T147052) (owner: 10Jcrespo) 
[08:43:52] <grrrit-wm>	 (03CR) 10Marostegui: [C: 031] mariadb: change check_private_data to print DROP statements [puppet] - 10https://gerrit.wikimedia.org/r/324386 (https://phabricator.wikimedia.org/T147052) (owner: 10Jcrespo) 
[08:44:09] <grrrit-wm>	 (03CR) 10Jcrespo: [C: 032] mariadb: change check_private_data to print DROP statements [puppet] - 10https://gerrit.wikimedia.org/r/324386 (https://phabricator.wikimedia.org/T147052) (owner: 10Jcrespo) 
[08:44:31] <_joe_>	 !log stopped dedicated commonswiki jobrunner T151196
[08:44:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:44:43] <stashbot>	 T151196: Job queue size growing since ~12:00 on 2016-11-19 - https://phabricator.wikimedia.org/T151196
[08:51:40] <grrrit-wm>	 (03PS2) 10Volans: RAID: reduce MegaCLI sensibility (physical disks) [puppet] - 10https://gerrit.wikimedia.org/r/324240 (https://phabricator.wikimedia.org/T151043) 
[09:04:24] <wikibugs_>	 06Operations, 10Wikimedia-General-or-Unknown, 07Availability, 13Patch-For-Review, and 2 others: Job queue size growing since ~12:00 on 2016-11-19 - https://phabricator.wikimedia.org/T151196#2834319 (10Ankry) >>! In T151196#2823445, @matmarex wrote: > Until the normal job processing is fixed to cope with th...
[09:07:46] <grrrit-wm>	 (03PS1) 10Aaron Schulz: Bump $wgJobBackoffThrottling for cache purges [mediawiki-config] - 10https://gerrit.wikimedia.org/r/324388 
[09:13:19] <wikibugs_>	 06Operations, 10OCG-General: ocg alarm ocg_job_status_queue 'flapping' - https://phabricator.wikimedia.org/T97524#1245233 (10Volans) The alarm is on again since 3 days on Icinga, and looking at the last 6 months trend it seems that the alarm might need some re-tuning if the trend is legitimate and not an indic...
[09:28:16] <grrrit-wm>	 (03PS6) 10Elukey: Refactor the parsing functions out of the main C file [software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/322257 (https://phabricator.wikimedia.org/T147440) 
[09:37:56] <wikibugs_>	 06Operations, 10Parsoid, 06Release-Engineering-Team: Provide a /parsoid directory on releases.wikimedia.org - https://phabricator.wikimedia.org/T150672#2792988 (10Legoktm) A new directory can be created by defining it in puppet: https://github.com/wikimedia/operations-puppet/blob/production/modules/releases/...
[09:38:42] <wikibugs>	 06Operations, 10Traffic: several 502 Bad Gateway - https://phabricator.wikimedia.org/T151686#2824625 (10ema) @doctaxon still happening?
[09:42:11] <grrrit-wm>	 (03PS1) 10Jcrespo: mariadb: fix bugs with check_private_data regarding DROP and NULL [puppet] - 10https://gerrit.wikimedia.org/r/324390 (https://phabricator.wikimedia.org/T147052) 
[09:42:52] <marostegui>	 !log Stop mysql on db2048 for maintenance - https://phabricator.wikimedia.org/T149553
[09:43:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:44:11] <wikibugs>	 06Operations, 10Traffic, 13Patch-For-Review: Huge increase in cache_upload 404s due to buggy client-side code from graphiq.com - https://phabricator.wikimedia.org/T151444#2834370 (10ema) p:05Normal>03Low The amount of upload 404s decreased significantly, we're almost back to normal: https://grafana.wikim...
[09:46:41] <grrrit-wm>	 (03CR) 10Zfilipin: "I cherry picked it at the CI puppet master. Will puppet run automatically, or should I run it manually?" [puppet] - 10https://gerrit.wikimedia.org/r/324203 (https://phabricator.wikimedia.org/T117418) (owner: 10Zfilipin) 
[09:47:26] <grrrit-wm>	 (03CR) 10Marostegui: [C: 031] mariadb: fix bugs with check_private_data regarding DROP and NULL [puppet] - 10https://gerrit.wikimedia.org/r/324390 (https://phabricator.wikimedia.org/T147052) (owner: 10Jcrespo) 
[09:48:20] <grrrit-wm>	 (03CR) 10Jcrespo: [C: 032] mariadb: fix bugs with check_private_data regarding DROP and NULL [puppet] - 10https://gerrit.wikimedia.org/r/324390 (https://phabricator.wikimedia.org/T147052) (owner: 10Jcrespo) 
[10:15:26] <grrrit-wm>	 (03PS1) 10Giuseppe Lavagetto: mediawiki::hhvm: allow to override the default jemalloc arenas [puppet] - 10https://gerrit.wikimedia.org/r/324394 (https://phabricator.wikimedia.org/T151702) 
[10:16:37] <grrrit-wm>	 (03PS1) 10ArielGlenn: miscdumps: make refresh interval for lock a few seconds shorter than stale time [dumps] - 10https://gerrit.wikimedia.org/r/324395 
[10:17:19] <grrrit-wm>	 (03CR) 10ArielGlenn: [C: 032] miscdumps: make refresh interval for lock a few seconds shorter than stale time [dumps] - 10https://gerrit.wikimedia.org/r/324395 (owner: 10ArielGlenn) 
[10:22:01] <wikibugs_>	 06Operations, 10netops: Thorium (new stat1001) needs to communicate with the Analytics VLAN - https://phabricator.wikimedia.org/T151990#2834399 (10elukey)
[10:22:16] <wikibugs>	 06Operations, 10netops: Thorium (new stat1001) needs to communicate with the Analytics VLAN - https://phabricator.wikimedia.org/T151990#2834411 (10elukey) p:05Triage>03Normal
[10:25:31] <grrrit-wm>	 (03PS2) 10Giuseppe Lavagetto: mediawiki::hhvm: allow to override the default jemalloc arenas [puppet] - 10https://gerrit.wikimedia.org/r/324394 (https://phabricator.wikimedia.org/T151702) 
[10:33:44] <wikibugs_>	 06Operations, 10netops: Thorium (new stat1001) needs to communicate with the Analytics VLAN - https://phabricator.wikimedia.org/T151990#2834422 (10elukey) Had a chat with Alex on IRC about what stat1001 does and what level of access it should have. For the Apache VHosts point of view it would be better to have...
[10:35:08] <grrrit-wm>	 (03CR) 10Volans: prometheus: add vhtcpd stats via node-exporter (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/323559 (https://phabricator.wikimedia.org/T147429) (owner: 10Filippo Giunchedi) 
[10:37:23] <wikibugs>	 06Operations, 10Analytics, 10netops: Thorium (new stat1001) needs to communicate with the Analytics VLAN - https://phabricator.wikimedia.org/T151990#2834430 (10elukey)
[10:48:21] <grrrit-wm>	 (03PS3) 10Zfilipin: ChromeDriver should be in PATH for jobs that run Selenium tests [puppet] - 10https://gerrit.wikimedia.org/r/324203 (https://phabricator.wikimedia.org/T117418) 
[10:48:46] <icinga-wm>	 PROBLEM - puppet last run on cp3030 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[10:50:24] <wikibugs>	 06Operations, 10Ops-Access-Requests, 06Discovery, 06Maps, and 2 others: Requesting access to analytics-privatedata-users for technical user discovery-stats - https://phabricator.wikimedia.org/T151063#2834456 (10Gehel) 05Open>03Resolved It looks like the script is now performing correctly, no error seen...
[10:51:22] <grrrit-wm>	 (03CR) 10Hashar: [C: 031] "Got cherry picked on the CI puppet master and that provisioned the symbolic links on the permanent slaves." [puppet] - 10https://gerrit.wikimedia.org/r/324203 (https://phabricator.wikimedia.org/T117418) (owner: 10Zfilipin) 
[10:52:07] <grrrit-wm>	 (03PS1) 10MarcoAurelio: Allow contentadmin and sysop to add/remove autopatrolled users on Wikitech [mediawiki-config] - 10https://gerrit.wikimedia.org/r/324401 
[10:52:12] <grrrit-wm>	 (03PS1) 10Jcrespo: mariadb: More bugfixes for check_private_data.py [puppet] - 10https://gerrit.wikimedia.org/r/324402 (https://phabricator.wikimedia.org/T147052) 
[10:52:27] <grrrit-wm>	 (03PS3) 10Giuseppe Lavagetto: mediawiki::hhvm: allow to override the default jemalloc arenas [puppet] - 10https://gerrit.wikimedia.org/r/324394 (https://phabricator.wikimedia.org/T151702) 
[10:53:08] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] mariadb: More bugfixes for check_private_data.py [puppet] - 10https://gerrit.wikimedia.org/r/324402 (https://phabricator.wikimedia.org/T147052) (owner: 10Jcrespo) 
[10:54:59] <grrrit-wm>	 (03PS2) 10Jcrespo: mariadb: More bugfixes for check_private_data.py [puppet] - 10https://gerrit.wikimedia.org/r/324402 (https://phabricator.wikimedia.org/T147052) 
[10:58:25] <grrrit-wm>	 (03PS3) 10Jcrespo: mariadb: More bugfixes for check_private_data.py [puppet] - 10https://gerrit.wikimedia.org/r/324402 (https://phabricator.wikimedia.org/T147052) 
[10:59:26] <grrrit-wm>	 (03CR) 10Giuseppe Lavagetto: [C: 032] mediawiki::hhvm: allow to override the default jemalloc arenas [puppet] - 10https://gerrit.wikimedia.org/r/324394 (https://phabricator.wikimedia.org/T151702) (owner: 10Giuseppe Lavagetto) 
[10:59:37] <grrrit-wm>	 (03PS4) 10Jcrespo: mariadb: More bugfixes for check_private_data.py [puppet] - 10https://gerrit.wikimedia.org/r/324402 (https://phabricator.wikimedia.org/T147052) 
[11:00:33] <grrrit-wm>	 (03CR) 10Jcrespo: [C: 032] mariadb: More bugfixes for check_private_data.py [puppet] - 10https://gerrit.wikimedia.org/r/324402 (https://phabricator.wikimedia.org/T147052) (owner: 10Jcrespo) 
[11:07:23] <_joe_>	 !log rolling upgrade of hhvm on the eqiad api cluster
[11:07:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:07:39] <grrrit-wm>	 (03PS1) 10Jcrespo: mariadb: Fix final check_private_data.py quoting bugs [puppet] - 10https://gerrit.wikimedia.org/r/324404 
[11:08:10] <grrrit-wm>	 (03PS2) 10Jcrespo: mariadb: Fix final check_private_data.py quoting bugs [puppet] - 10https://gerrit.wikimedia.org/r/324404 
[11:08:46] <wikibugs>	 06Operations, 10Traffic, 13Patch-For-Review: Huge increase in cache_upload 404s due to buggy client-side code from graphiq.com - https://phabricator.wikimedia.org/T151444#2834521 (10ema) Leaving this ticket open though given that they still haven't fixed the javascript bug. The decrease in 404s is probably d...
[11:08:52] <volans>	 jynus: can you ping me when you have to merge this? ^^^ same test of yesterday ;)
[11:09:04] <jynus>	 as soon as it +2
[11:09:14] <jynus>	 (jenkins does)
[11:09:39] <jynus>	 so, now
[11:09:52] <grrrit-wm>	 (03CR) 10Jcrespo: [C: 032] mariadb: Fix final check_private_data.py quoting bugs [puppet] - 10https://gerrit.wikimedia.org/r/324404 (owner: 10Jcrespo) 
[11:10:05] <jynus>	 volans^
[11:10:05] <volans>	 ok, merge on gerrit, I'll take care of puppet-merge in a second
[11:10:07] <volans>	 thanks
[11:10:10] <grrrit-wm>	 (03CR) 10Faidon Liambotis: [C: 032] RAID: reduce MegaCLI sensibility (physical disks) [puppet] - 10https://gerrit.wikimedia.org/r/324240 (https://phabricator.wikimedia.org/T151043) (owner: 10Volans) 
[11:10:33] <grrrit-wm>	 (03PS1) 10ArielGlenn: miscdumps: fix up config defaults [dumps] - 10https://gerrit.wikimedia.org/r/324406 
[11:10:35] <grrrit-wm>	 (03PS3) 10Volans: RAID: reduce MegaCLI sensibility (physical disks) [puppet] - 10https://gerrit.wikimedia.org/r/324240 (https://phabricator.wikimedia.org/T151043) 
[11:12:31] <volans>	 jynus: puppet-merged, thanks a lot!
[11:12:58] <volans>	 paravoid: FYI all tests on puppet-merge molly guard successful, I'm merging that
[11:13:27] <jynus>	 can I ask the 1-line summary of that functionality?
[11:13:57] <volans>	 sure, if there are commits from multiple committers when you run puppet-merge, instead of saying yes you have to say "multiple" 
[11:14:19] <volans>	 so if you type yes and enter without noticing the warning it aborts the merge
[11:14:24] <volans>	 to prevent muscle memory errors
[11:14:52] <volans>	 ofc it's saying that in the prompt, but I'll send an email too
[11:14:55] <jynus>	 what if I say yes (because there is 1 on check) but on merge there are multiple?
[11:15:08] <jynus>	 does it fix that?
[11:15:56] <volans>	 not yet, also because the thing is async and run on multiple puppetmasters, so less trivial as a change
[11:16:14] <jynus>	 ok
[11:16:19] <jynus>	 yes, I know it is not simple
[11:16:20] <marostegui>	 !log Stop replication s3 - db1095 - maintenance - T147052
[11:16:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:16:31] <stashbot>	 T147052: Provision with data the new labsdb servers and provide replica service with at least 1 shard from a sanitized copy from production - https://phabricator.wikimedia.org/T147052
[11:16:39] <jynus>	 I looked at it and said myself "not worth the time"
[11:16:50] <grrrit-wm>	 (03PS3) 10Volans: Puppet merge: molly-guard multiple commits [puppet] - 10https://gerrit.wikimedia.org/r/322362 
[11:17:13] <grrrit-wm>	 (03CR) 10ArielGlenn: [C: 032] miscdumps: fix up config defaults [dumps] - 10https://gerrit.wikimedia.org/r/324406 (owner: 10ArielGlenn) 
[11:17:46] <icinga-wm>	 RECOVERY - puppet last run on cp3030 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures
[11:18:15] <jynus>	 also if we cherry-pick instead of rebase/pull all sort of wrong things can go wrong
[11:20:42] <grrrit-wm>	 (03PS1) 10Paladox: Phabricator: Allow us to change the default web domain [puppet] - 10https://gerrit.wikimedia.org/r/324408 
[11:20:58] <grrrit-wm>	 (03PS2) 10Paladox: Phabricator: Allow us to change the default web domain [puppet] - 10https://gerrit.wikimedia.org/r/324408 
[11:21:26] <icinga-wm>	 PROBLEM - puppet last run on db1024 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[11:21:49] <grrrit-wm>	 (03PS3) 10Paladox: Phabricator: Allow us to change the default web domain [puppet] - 10https://gerrit.wikimedia.org/r/324408 
[11:21:51] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Phabricator: Allow us to change the default web domain [puppet] - 10https://gerrit.wikimedia.org/r/324408 (owner: 10Paladox) 
[11:22:01] <_joe_>	 !log repooling mw1276, after tests for T151702 
[11:22:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:22:11] <stashbot>	 T151702: API cluster failure / OOM - https://phabricator.wikimedia.org/T151702
[11:22:12] <grrrit-wm>	 (03PS4) 10Paladox: Phabricator: Allow us to change the default web domain [puppet] - 10https://gerrit.wikimedia.org/r/324408 
[11:23:09] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Phabricator: Allow us to change the default web domain [puppet] - 10https://gerrit.wikimedia.org/r/324408 (owner: 10Paladox) 
[11:25:00] <grrrit-wm>	 (03PS1) 10ArielGlenn: update cron command and config file for adds/changes dumps [puppet] - 10https://gerrit.wikimedia.org/r/324409 
[11:25:04] <grrrit-wm>	 (03PS5) 10Paladox: Phabricator: Allow us to change the default web domain [puppet] - 10https://gerrit.wikimedia.org/r/324408 
[11:28:13] <logmsgbot>	 !log ariel@tin Starting deploy [dumps/dumps@50689c8]: (no message)
[11:28:19] <logmsgbot>	 !log ariel@tin Finished deploy [dumps/dumps@50689c8]: (no message) (duration: 00m 07s)
[11:28:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:28:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:30:32] <grrrit-wm>	 (03PS2) 10ArielGlenn: update cron command and config file for adds/changes dumps [puppet] - 10https://gerrit.wikimedia.org/r/324409 
[11:34:22] <wikibugs_>	 06Operations, 10DBA: Rolling restart of external storage servers for TLS certificate update - https://phabricator.wikimedia.org/T151995#2834580 (10jcrespo)
[11:35:52] <grrrit-wm>	 (03PS3) 10ArielGlenn: update cron command and config file for adds/changes dumps [puppet] - 10https://gerrit.wikimedia.org/r/324409 
[11:36:03] <grrrit-wm>	 (03PS1) 10Jcrespo: mariadb: Drop ssl (tls) options from external storage servers [puppet] - 10https://gerrit.wikimedia.org/r/324411 (https://phabricator.wikimedia.org/T151995) 
[11:36:06] <grrrit-wm>	 (03PS1) 10Jcrespo: mariadb: Add semicolon after each SQL query output (private data check) [puppet] - 10https://gerrit.wikimedia.org/r/324412 (https://phabricator.wikimedia.org/T147052) 
[11:36:20] <grrrit-wm>	 (03PS2) 10Jcrespo: mariadb: Add semicolon after each SQL query output (private data check) [puppet] - 10https://gerrit.wikimedia.org/r/324412 (https://phabricator.wikimedia.org/T147052) 
[11:37:07] <grrrit-wm>	 (03CR) 10ArielGlenn: [C: 032] update cron command and config file for adds/changes dumps [puppet] - 10https://gerrit.wikimedia.org/r/324409 (owner: 10ArielGlenn) 
[11:37:13] <volans>	 jynus: I guess you and Ariel the unlucky candidate for my tests after merging :)
[11:38:04] <jynus>	 mmm
[11:38:24] <jynus>	 324411 needs deep review
[11:38:28] <jynus>	 before merging
[11:38:58] <volans>	 I was thiking 324412
[11:39:51] <grrrit-wm>	 (03CR) 10Marostegui: [C: 031] "looks good" [puppet] - 10https://gerrit.wikimedia.org/r/324411 (https://phabricator.wikimedia.org/T151995) (owner: 10Jcrespo) 
[11:41:02] <grrrit-wm>	 (03PS3) 10Jcrespo: mariadb: Add semicolon after each SQL query output (private data check) [puppet] - 10https://gerrit.wikimedia.org/r/324412 (https://phabricator.wikimedia.org/T147052) 
[11:42:36] <grrrit-wm>	 (03CR) 10Volans: [C: 031] "LGTM, the delicate part is the rolling restart and reset of SSL parameters in replication, in particular the cross-dc ones." [puppet] - 10https://gerrit.wikimedia.org/r/324411 (https://phabricator.wikimedia.org/T151995) (owner: 10Jcrespo) 
[11:44:14] <mafk>	 nuria: hola/hi ping re https://gerrit.wikimedia.org/r/#/c/323699/
[11:44:53] <_joe_>	 !log upgrading HHVM across appservers in eqiad
[11:45:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:47:10] <grrrit-wm>	 (03CR) 10Jcrespo: [C: 032] mariadb: Add semicolon after each SQL query output (private data check) [puppet] - 10https://gerrit.wikimedia.org/r/324412 (https://phabricator.wikimedia.org/T147052) (owner: 10Jcrespo) 
[11:47:23] <jynus>	 volans^
[11:47:30] <volans>	 jynus: ok, running it
[11:48:21] <volans>	 jynus: done, thanks again
[11:49:26] <icinga-wm>	 RECOVERY - puppet last run on db1024 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures
[11:50:12] <grrrit-wm>	 (03PS7) 10Elukey: Refactor the parsing functions out of the main C file [software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/322257 (https://phabricator.wikimedia.org/T147440) 
[12:00:51] <grrrit-wm>	 (03CR) 10Jcrespo: [C: 031] "Looks good: https://puppet-compiler.wmflabs.org/4720/" [puppet] - 10https://gerrit.wikimedia.org/r/324411 (https://phabricator.wikimedia.org/T151995) (owner: 10Jcrespo) 
[12:06:56] <icinga-wm>	 PROBLEM - puppet last run on db1037 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[12:09:33] <grrrit-wm>	 (03PS2) 10Jcrespo: mariadb: Drop ssl (tls) options from external storage servers [puppet] - 10https://gerrit.wikimedia.org/r/324411 (https://phabricator.wikimedia.org/T151995) 
[12:17:40] <grrrit-wm>	 (03CR) 10Jcrespo: [C: 032 V: 032] mariadb: Drop ssl (tls) options from external storage servers [puppet] - 10https://gerrit.wikimedia.org/r/324411 (https://phabricator.wikimedia.org/T151995) (owner: 10Jcrespo) 
[12:18:46] <icinga-wm>	 PROBLEM - puppet last run on cp4015 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[12:26:35] <wikibugs_>	 06Operations, 10DBA, 13Patch-For-Review: Rolling restart of external storage servers for TLS certificate update - https://phabricator.wikimedia.org/T151995#2834676 (10jcrespo) p:05Normal>03High a:03jcrespo
[12:29:38] <jynus>	 !log mysql restart and general upgrade for es2015 T151995
[12:29:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:29:52] <stashbot>	 T151995: Rolling restart of external storage servers for TLS certificate update - https://phabricator.wikimedia.org/T151995
[12:30:48] <grrrit-wm>	 (03CR) 10Elukey: [C: 04-1] "WIP" [software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/322256 (https://phabricator.wikimedia.org/T147440) (owner: 10Elukey) 
[12:31:14] <grrrit-wm>	 (03CR) 10Elukey: [C: 04-1] "WIP" [software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/322257 (https://phabricator.wikimedia.org/T147440) (owner: 10Elukey) 
[12:32:25] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 04-1] docker: add package provider (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/323815 (owner: 10Giuseppe Lavagetto) 
[12:35:56] <icinga-wm>	 RECOVERY - puppet last run on db1037 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures
[12:40:21] <wikibugs>	 06Operations, 10Traffic, 13Patch-For-Review: python-varnishapi daemons seeing "Log overrun" constantly - https://phabricator.wikimedia.org/T151643#2834702 (10ema) I've confirmed with stap that the overruns come from vslc_vsm_next, which in turn calls vslc_vsm_check: https://github.com/varnishcache/varnish-ca...
[12:40:51] <grrrit-wm>	 (03CR) 10Volans: "I've run a quick puppet compiler:" [puppet] - 10https://gerrit.wikimedia.org/r/320246 (https://phabricator.wikimedia.org/T150160) (owner: 10Dzahn) 
[12:44:21] <wikibugs>	 06Operations, 10Monitoring: dbstore1001 backup jobs failed between 2016-10-19 and 2016-11-23 - https://phabricator.wikimedia.org/T151579#2834707 (10jcrespo) 05Open>03Resolved a:03jcrespo Last backups seem to have been successful:   ``` ls -lha enwiki* -rw-r----- 1 root root  84G Nov 23 06:46 enwiki-20161...
[12:46:07] <wikibugs_>	 06Operations, 10DBA, 10Monitoring: Create script to monitor db dumps for backups are successful (and if not, old backups are not deleted) - https://phabricator.wikimedia.org/T151999#2834710 (10jcrespo)
[12:47:06] <icinga-wm>	 PROBLEM - puppet last run on ms-be1015 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[12:48:46] <icinga-wm>	 RECOVERY - puppet last run on cp4015 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures
[12:55:17] <ema>	 !log bumping vsl log buffer on cp3032 (depooled) -- T151643
[12:55:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:55:29] <stashbot>	 T151643: python-varnishapi daemons seeing "Log overrun" constantly - https://phabricator.wikimedia.org/T151643
[13:08:16] <jynus>	 !log mysql restart and general upgrade for es2019 T151995
[13:08:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:08:28] <stashbot>	 T151995: Rolling restart of external storage servers for TLS certificate update - https://phabricator.wikimedia.org/T151995
[13:09:39] <logmsgbot>	 !log reedy@tin Synchronized php-1.29.0-wmf.3/extensions/CentralAuth/maintenance/populateLocalAndGlobalIds.php: More skipping (duration: 01m 34s)
[13:09:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:10:42] <logmsgbot>	 !log reedy@tin Synchronized php-1.29.0-wmf.4/extensions/CentralAuth/maintenance/populateLocalAndGlobalIds.php: More skipping (duration: 00m 44s)
[13:10:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:15:51] <jynus>	 !log mysql restart and general upgrade for es2017 T151995
[13:16:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:16:04] <stashbot>	 T151995: Rolling restart of external storage servers for TLS certificate update - https://phabricator.wikimedia.org/T151995
[13:16:06] <icinga-wm>	 RECOVERY - puppet last run on ms-be1015 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures
[13:32:12] <wikibugs>	 06Operations, 10Cassandra, 10RESTBase, 06Services (doing): RESTBase k-r-v as Cassandra anti-pattern (or: revision retention policies considered harmful) - https://phabricator.wikimedia.org/T144431#2834839 (10mark)
[13:38:17] <wikibugs>	 06Operations, 10Wikimedia-Site-requests: Add IPv6 address for dashboard.wikiedu.org to the ratelimit exemptions - https://phabricator.wikimedia.org/T151823#2834851 (10Dereckson)
[13:38:59] <Krenair>	 Dereckson, isn't that in the mw config?
[13:49:09] <Dereckson>	 Krenair: yes, it is, I initially misread the request to add an IPv6 on a wmf server
[13:55:18] <Dereckson>	 !log Reset user email for projectcomwiki initial account "Mjohnson (WMF)"
[13:55:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:58:36] <icinga-wm>	 PROBLEM - puppet last run on cp3048 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[14:00:04] <jouncebot>	 addshore, hashar, anomie, ostriches, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, and thcipriani: Dear anthropoid, the time has come. Please deploy European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20161130T1400).
[14:00:04] <jouncebot>	 Urbanecm, kart_, and mafk: A patch you scheduled for European Mid-day SWAT(Max 8 patches) is about to be deployed. Please be available during the process.
[14:00:24] * Urbanecm waves
[14:00:26] <Dereckson>	 Hello, I can SWAT.
[14:00:31] <grrrit-wm>	 (03PS1) 10Dereckson: Add dashboard.wikiedu.org IPv6 to en.wikipedia rate limit exempt [mediawiki-config] - 10https://gerrit.wikimedia.org/r/324446 (https://phabricator.wikimedia.org/T151823) 
[14:00:37] <Dereckson>	 I'll also add this change ^
[14:00:51] <kart__>	 I'm here as kart__
[14:01:41] <grrrit-wm>	 (03CR) 10Urbanecm: [C: 031] "Fine for me." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/324446 (https://phabricator.wikimedia.org/T151823) (owner: 10Dereckson) 
[14:02:01] <zeljkof>	 Dereckson: have fun! I was just about to ask who wants to swat
[14:03:15] <Urbanecm>	 zeljkof and Dereckson, you both can swat I think :D
[14:03:37] <hashar>	 o/
[14:03:56] <zeljkof>	 Urbanecm: I'll leave it to Dereckson today :)
[14:04:11] <Urbanecm>	 I have no problem with it :). 
[14:05:55] <wikibugs>	 06Operations, 06Discovery, 06Discovery-Search, 10Elasticsearch: Decrease time required to fully restart the Cirrus elasticsearch clusters - https://phabricator.wikimedia.org/T145065#2834951 (10Gehel) 05declined>03Open Re-opening this and linking it to upstream ticket: https://github.com/elastic/elastic...
[14:06:45] <jynus>	 !log mysql restart and general upgrade for es2014 T151995
[14:06:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:06:56] <stashbot>	 T151995: Rolling restart of external storage servers for TLS certificate update - https://phabricator.wikimedia.org/T151995
[14:07:22] <grrrit-wm>	 (03PS1) 10Giuseppe Lavagetto: mediawiki: tweak jemalloc arenas on api, appserver canaries [puppet] - 10https://gerrit.wikimedia.org/r/324447 (https://phabricator.wikimedia.org/T151702) 
[14:08:01] <grrrit-wm>	 (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] mediawiki: tweak jemalloc arenas on api, appserver canaries [puppet] - 10https://gerrit.wikimedia.org/r/324447 (https://phabricator.wikimedia.org/T151702) (owner: 10Giuseppe Lavagetto) 
[14:08:32] <grrrit-wm>	 (03PS2) 10Dereckson: [logo] Add logo for arbcom_cswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/324188 (https://phabricator.wikimedia.org/T151731) (owner: 10Urbanecm) 
[14:09:24] <grrrit-wm>	 (03CR) 10Dereckson: [C: 032] [logo] Add logo for arbcom_cswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/324188 (https://phabricator.wikimedia.org/T151731) (owner: 10Urbanecm) 
[14:10:44] <grrrit-wm>	 (03Merged) 10jenkins-bot: [logo] Add logo for arbcom_cswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/324188 (https://phabricator.wikimedia.org/T151731) (owner: 10Urbanecm) 
[14:10:54] <grrrit-wm>	 (03PS3) 10Dereckson: [logo] Add logo to Wikivoyage Finnish [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323697 (https://phabricator.wikimedia.org/T151571) (owner: 10Urbanecm) 
[14:11:37] <grrrit-wm>	 (03CR) 10Dereckson: [C: 032] [logo] Add logo to Wikivoyage Finnish [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323697 (https://phabricator.wikimedia.org/T151571) (owner: 10Urbanecm) 
[14:12:21] <Dereckson>	 Marco isn't here.
[14:12:54] <grrrit-wm>	 (03Merged) 10jenkins-bot: [logo] Add logo to Wikivoyage Finnish [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323697 (https://phabricator.wikimedia.org/T151571) (owner: 10Urbanecm) 
[14:13:21] <Dereckson>	 How do we test wmgPrivilegedGroups, by asking someone from a group, not yet in another privileged group if 2FA works?
[14:14:07] <Urbanecm>	 Dereckson, maybe a testaccount can become a member of a group and test it. But I have no access to rights at meta. 
[14:14:22] <Dereckson>	 Urbanecm: your logos are live on mwdebug1002.eqiad.wmnet if you wish to check them at /static/...
[14:14:31] <Urbanecm>	 Yes, going to check them. 
[14:15:37] <Urbanecm>	 Dereckson, you can deploy it to the whole network. 
[14:16:12] <Dereckson>	 By the way, a scap pull on mwdebug1002 still hangs out after Finished rsync common
[14:16:31] <Urbanecm>	 Does it mean something may be wrong?
[14:16:59] <Dereckson>	 Yes, but unrelated with the logos.
[14:17:23] <Urbanecm>	 Okay, thanks. 
[14:17:35] <logmsgbot>	 !log dereckson@tin Synchronized static/images/project-logos: New project logos for wiki to create (arbcom cs, fi.wikivoyage) (duration: 00m 46s)
[14:17:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:18:07] <Urbanecm>	 Dereckson, working, thanks. 
[14:18:34] <grrrit-wm>	 (03PS2) 10Dereckson: Add dashboard.wikiedu.org IPv6 to en.wikipedia rate limit exempt [mediawiki-config] - 10https://gerrit.wikimedia.org/r/324446 (https://phabricator.wikimedia.org/T151823) 
[14:18:48] <grrrit-wm>	 (03CR) 10Dereckson: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/324446 (https://phabricator.wikimedia.org/T151823) (owner: 10Dereckson) 
[14:19:28] <grrrit-wm>	 (03Merged) 10jenkins-bot: Add dashboard.wikiedu.org IPv6 to en.wikipedia rate limit exempt [mediawiki-config] - 10https://gerrit.wikimedia.org/r/324446 (https://phabricator.wikimedia.org/T151823) (owner: 10Dereckson) 
[14:21:34] <Dereckson>	 kart__: so if I understand https://phabricator.wikimedia.org/T151868#2830467 your fix is only for newest code in wmf.4 and the issue doesn't exist in wmf.3?
[14:22:12] <Dereckson>	 324446 live on mwdebug1002
[14:22:26] <kart__>	 Dereckson: right
[14:22:36] <kart__>	 Dereckson: it will go live later today.
[14:23:56] <logmsgbot>	 !log dereckson@tin Synchronized wmf-config/InitialiseSettings.php: Add dashboard.wikiedu.org IPv6 to en.wikipedia rate limit exempt (T151823) (duration: 00m 45s)
[14:24:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:24:07] <stashbot>	 T151823: Add IPv6 address for dashboard.wikiedu.org to the ratelimit exemptions - https://phabricator.wikimedia.org/T151823
[14:27:37] <icinga-wm>	 RECOVERY - puppet last run on cp3048 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures
[14:29:07] <wikibugs_>	 06Operations, 10Wikimedia-General-or-Unknown, 07Availability, 13Patch-For-Review, and 2 others: Job queue size growing since ~12:00 on 2016-11-19 - https://phabricator.wikimedia.org/T151196#2834996 (10matmarex) So, I guess this is resolved? @joe @aaron Are there any follow-up tasks to be filed, or is T1238...
[14:30:06] <Dereckson>	 kart__: we're waiting mwext-testextension-php55 and mwext-testextension-hhvm jobs to run
[14:32:29] <Dereckson>	 Here we are
[14:33:33] <Dereckson>	 kart__: live on mwdebug1002
[14:33:37] <kart__>	 Dereckson: done?
[14:33:59] <Dereckson>	 yes, and I sent your change on the mwdebug1002 server
[14:34:07] <Dereckson>	 This is the server replacing mw1099
[14:34:41] <Dereckson>	 If you use the X Wikimedia Debug extension, it already has been upgraded
[14:34:44] <kart__>	 OK. Let me check if nothing breaks.
[14:34:47] <kart__>	 Yep
[14:35:11] <Dereckson>	 hi mafk 
[14:35:45] <Dereckson>	 I were going to skip your change, as I don't have a lot of ideas about how to test it, Urbanecm offered to create a test account and add it to the group, to check 2FA works.
[14:35:49] <mafk>	 sorry I'm late
[14:36:06] <Reedy>	 I really wouldn't worry about testing it
[14:36:08] <Dereckson>	 ok
[14:36:14] <Reedy>	 Dereckson: Just look at Special:UserGroupRights
[14:36:18] <Reedy>	 Check it's been added to the group
[14:36:20] <Reedy>	 That's enough :)
[14:36:28] <Dereckson>	 oh yes true there is an associated user right for that
[14:36:55] <kart__>	 Dereckson: go ahead. as not possible to test change without fully deployed code.
[14:37:11] <grrrit-wm>	 (03PS3) 10Dereckson: WMF staff local groups to $wmgPrivilegedGroups [mediawiki-config] - 10https://gerrit.wikimedia.org/r/324250 (https://phabricator.wikimedia.org/T150951) (owner: 10MarcoAurelio) 
[14:37:30] <mafk>	 Dereckson: there's no need to add anyone, just look at special:listgrouprights if they have the oathauth-enable
[14:37:41] <grrrit-wm>	 (03CR) 10Dereckson: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/324250 (https://phabricator.wikimedia.org/T150951) (owner: 10MarcoAurelio) 
[14:37:47] <mafk>	 tell me when ready on mw1099
[14:37:57] <mafk>	 or mwdebug1002 now
[14:39:15] <grrrit-wm>	 (03Merged) 10jenkins-bot: WMF staff local groups to $wmgPrivilegedGroups [mediawiki-config] - 10https://gerrit.wikimedia.org/r/324250 (https://phabricator.wikimedia.org/T150951) (owner: 10MarcoAurelio) 
[14:39:19] <Dereckson>	 kart__: syncing
[14:39:57] <Dereckson>	 mafk: live on mwdebug1002
[14:40:00] <logmsgbot>	 !log dereckson@tin Synchronized php-1.29.0-wmf.4/extensions/ContentTranslation/modules/tools/ext.cx.tools.template.js: Allow template editor even if parameter mapping fails completely (T151868) (duration: 00m 45s)
[14:40:05] <mafk>	 checking
[14:40:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:40:14] <stashbot>	 T151868: Template translation is broken if none of the parameters can be auto-mapped - https://phabricator.wikimedia.org/T151868
[14:40:58] <mafk>	 Dereckson: it's ok on debug
[14:41:20] <Dereckson>	 mafk: ok
[14:41:22] <kart__>	 Dereckson: thanks!
[14:41:26] <Dereckson>	 kart__: works?
[14:41:43] <kart__>	 Dereckson: we will only know later today :)
[14:41:46] <kart__>	 no worries.
[14:41:47] <Dereckson>	 ok
[14:41:58] <kart__>	 Fresh code+SWAT.
[14:42:03] <logmsgbot>	 !log dereckson@tin Synchronized wmf-config/InitialiseSettings.php: Add WMF staff local groups to $wmgPrivilegedGroups (T150951) (duration: 00m 46s)
[14:42:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:42:14] <stashbot>	 T150951: Create list of privileged wiki groups - https://phabricator.wikimedia.org/T150951
[14:42:40] <jynus>	 !log mysql restart and general upgrade for es2011 T151995
[14:42:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:42:52] <stashbot>	 T151995: Rolling restart of external storage servers for TLS certificate update - https://phabricator.wikimedia.org/T151995
[14:44:05] <Dereckson>	 !log EU SWAT done
[14:44:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:44:14] <hashar>	 \O/
[14:44:48] <marostegui>	 !log Stop MySQL and shutdown db2048 for maintenance - T149553
[14:44:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:44:56] <stashbot>	 T149553: db2034: investigate its crash and reimage - https://phabricator.wikimedia.org/T149553
[14:52:26] <icinga-wm>	 PROBLEM - puppet last run on mw1218 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[14:57:33] <grrrit-wm>	 (03PS1) 10Giuseppe Lavagetto: mediawiki: make jemalloc arenas equal to the processorcount [puppet] - 10https://gerrit.wikimedia.org/r/324462 (https://phabricator.wikimedia.org/T151702) 
[14:58:14] <paravoid>	 not 4*cpucount?
[15:00:45] <_joe_>	 paravoid: looking at Tim's tests, and my own, it won't make much of a difference and I preferred to be a bit conservative given we're waiting to see what the rationale from fb is
[15:00:54] <wikibugs_>	 06Operations, 10Analytics, 10netops: Thorium (new stat1001) needs to communicate with the Analytics VLAN - https://phabricator.wikimedia.org/T151990#2835067 (10Ottomata) Uh OH!  This is SUPPOSED to be in the analytics VLAN!  https://phabricator.wikimedia.org/T149911  Re-opening that ticket.  Sorry, I shoulda...
[15:01:06] <_joe_>	 anyways, it won't make sense raising that much higher than 2*processorcount
[15:01:15] <_joe_>	 or, the number of allowed hhvm threads
[15:01:28] <wikibugs_>	 06Operations, 06Analytics-Kanban, 10hardware-requests: stat1001 replacement box in eqiad - https://phabricator.wikimedia.org/T149911#2835068 (10Ottomata) 05Resolved>03Open Uh OH!  @RobH   > This should be installed within the Analytics VLAN, but it does not matter which row.  I think thorium may have had...
[15:01:59] <wikibugs>	 06Operations, 06Analytics-Kanban, 10hardware-requests: stat1001 replacement box in eqiad - https://phabricator.wikimedia.org/T149911#2835072 (10Ottomata)
[15:02:02] <wikibugs_>	 06Operations, 10Analytics, 10netops: Thorium (new stat1001) needs to communicate with the Analytics VLAN - https://phabricator.wikimedia.org/T151990#2835074 (10Ottomata)
[15:11:20] <jynus>	 !log mysql restart and general upgrade for es2012 T151995
[15:11:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:11:31] <stashbot>	 T151995: Rolling restart of external storage servers for TLS certificate update - https://phabricator.wikimedia.org/T151995
[15:11:58] <grrrit-wm>	 (03CR) 10Giuseppe Lavagetto: [C: 032] mediawiki: make jemalloc arenas equal to the processorcount [puppet] - 10https://gerrit.wikimedia.org/r/324462 (https://phabricator.wikimedia.org/T151702) (owner: 10Giuseppe Lavagetto) 
[15:14:13] <_joe_>	 !log upgrading HHVM on the imagescalers
[15:14:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:18:36] <icinga-wm>	 PROBLEM - puppet last run on mw1294 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[strace]
[15:19:26] <icinga-wm>	 RECOVERY - puppet last run on mw1218 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures
[15:20:36] <icinga-wm>	 RECOVERY - puppet last run on mw1294 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures
[15:25:25] <wikibugs>	 06Operations, 10Traffic: several 502 Bad Gateway - https://phabricator.wikimedia.org/T151686#2835098 (10doctaxon) @ema * In a window of 2016-11-29  19:00  until  23:59 CET I couldn't receive any 502 Bad Gateway doing a lot of API queries using a permanent query loop.   * I could receive the last 502 Bad Gatewa...
[15:26:20] <wikibugs>	 06Operations, 10Traffic, 13Patch-For-Review, 05Prometheus-metrics-monitoring: Error collecting metrics from varnish_exporter on some misc hosts - https://phabricator.wikimedia.org/T150479#2787072 (10ema) The errors mentioned in the ticket description seem to be gone from cp4001.
[15:27:36] <grrrit-wm>	 (03PS1) 10Cmjohnson: Adding mgmt dns entries for restabse1016-1018 T150964 [dns] - 10https://gerrit.wikimedia.org/r/324465 
[15:28:35] <grrrit-wm>	 (03CR) 10Cmjohnson: [C: 032] Adding mgmt dns entries for restabse1016-1018 T150964 [dns] - 10https://gerrit.wikimedia.org/r/324465 (owner: 10Cmjohnson) 
[15:29:10] <jynus>	 !log mysql restart and general upgrade for es2013 T151995
[15:29:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:29:22] <stashbot>	 T151995: Rolling restart of external storage servers for TLS certificate update - https://phabricator.wikimedia.org/T151995
[15:29:55] <wikibugs>	 06Operations, 10Traffic: several 502 Bad Gateway - https://phabricator.wikimedia.org/T151686#2835114 (10ema) 05Open>03Resolved a:03ema @doctaxon great, thanks for confirming that the problem is solved. The default value for an internal varnish setting was not large enough and that was causing crashes wit...
[15:38:43] <grrrit-wm>	 (03PS1) 10Mforns: Add a reportupdater job for ee-migration [puppet] - 10https://gerrit.wikimedia.org/r/324466 (https://phabricator.wikimedia.org/T126358) 
[15:42:12] <grrrit-wm>	 (03CR) 10Addshore: [C: 04-1 V: 04-1] "Extension configuration has changed" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/322086 (https://phabricator.wikimedia.org/T150945) (owner: 10Addshore) 
[15:43:52] <grrrit-wm>	 (03PS1) 10Eevans: enable instance restbase2012-c.codfw.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/324469 (https://phabricator.wikimedia.org/T151086) 
[15:44:01] <grrrit-wm>	 (03PS3) 10Gehel: elasticsearch - upgrade to Java 8 [puppet] - 10https://gerrit.wikimedia.org/r/323154 (https://phabricator.wikimedia.org/T151325) 
[15:44:43] <jynus>	 !log stopping for 24 hours cross-dc replication on shards es2,es3 codfw->eqiad (es1015, es1019)
[15:44:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:45:02] <grrrit-wm>	 (03CR) 10Eevans: [C: 031] "Ready to pull the trigger." [puppet] - 10https://gerrit.wikimedia.org/r/324469 (https://phabricator.wikimedia.org/T151086) (owner: 10Eevans) 
[15:45:43] <grrrit-wm>	 (03CR) 10BryanDavis: [C: 031] Allow contentadmin and sysop to add/remove autopatrolled users on Wikitech [mediawiki-config] - 10https://gerrit.wikimedia.org/r/324401 (owner: 10MarcoAurelio) 
[15:46:17] <wikibugs>	 06Operations, 10DBA, 13Patch-For-Review: Rolling restart of external storage servers for TLS certificate update - https://phabricator.wikimedia.org/T151995#2835134 (10jcrespo) !log stopping for 24 hours cross-dc replication on shards es2,es3 codfw->eqiad (es1015, es1019)
[15:50:37] <grrrit-wm>	 (03CR) 10Gehel: "Puppet compiler agrees this is a noop on production systems" [puppet] - 10https://gerrit.wikimedia.org/r/323154 (https://phabricator.wikimedia.org/T151325) (owner: 10Gehel) 
[15:50:37] <jynus>	 !log mysql restart and general upgrade for es2016 T151995
[15:50:41] <grrrit-wm>	 (03PS4) 10Gehel: elasticsearch - upgrade to Java 8 [puppet] - 10https://gerrit.wikimedia.org/r/323154 (https://phabricator.wikimedia.org/T151325) 
[15:50:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:50:48] <stashbot>	 T151995: Rolling restart of external storage servers for TLS certificate update - https://phabricator.wikimedia.org/T151995
[15:51:47] <grrrit-wm>	 (03CR) 10Gehel: [C: 032] elasticsearch - upgrade to Java 8 [puppet] - 10https://gerrit.wikimedia.org/r/323154 (https://phabricator.wikimedia.org/T151325) (owner: 10Gehel) 
[16:01:08] <grrrit-wm>	 (03PS2) 10Dzahn: enable instance restbase2012-c.codfw.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/324469 (https://phabricator.wikimedia.org/T151086) (owner: 10Eevans) 
[16:05:36] <grrrit-wm>	 (03PS1) 10Chad: group1 to wmf.4 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/324475 
[16:06:00] <grrrit-wm>	 (03CR) 10Chad: [C: 04-2] "this iz 4 l8r" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/324475 (owner: 10Chad) 
[16:07:06] <wikibugs>	 06Operations, 10Electron-PDFs, 06TCB-Team, 13Patch-For-Review, and 2 others: Deploy ElectronPdfService Extension to beta cluster - https://phabricator.wikimedia.org/T150945#2802120 (10Addshore) a:03Addshore
[16:08:13] <wikibugs>	 06Operations, 10ops-eqiad, 10hardware-requests: Return wmf4747/wmf4748/wmf4749/wmf4750 to spares - https://phabricator.wikimedia.org/T146171#2835197 (10RobH) a:05Joe>03RobH Nope, it should come back to me to go back on spares, stealing!
[16:11:26] <grrrit-wm>	 (03CR) 10Dzahn: [C: 032] enable instance restbase2012-c.codfw.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/324469 (https://phabricator.wikimedia.org/T151086) (owner: 10Eevans) 
[16:11:52] <mutante>	 urandom: ^
[16:12:24] <urandom>	 mutante: thanks!
[16:12:56] <jynus>	 !log mysql restart and general upgrade for es2018 T151995
[16:13:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:13:09] <stashbot>	 T151995: Rolling restart of external storage servers for TLS certificate update - https://phabricator.wikimedia.org/T151995
[16:13:26] <urandom>	 mutante: and it looks to be off and running on it's own
[16:14:00] <mutante>	 urandom: you mean in a good way, it does it all by itself?
[16:14:11] <urandom>	 mutante: yeah
[16:14:15] <mutante>	 ok great :)
[16:15:26] <wikibugs_>	 06Operations, 10Wikimedia-General-or-Unknown, 07Availability, 13Patch-For-Review, and 2 others: Job queue size growing since ~12:00 on 2016-11-19 - https://phabricator.wikimedia.org/T151196#2835238 (10Joe) @matmarex I can confirm the jobqueue is now under control and I think the only real thing missing is...
[16:15:55] <wikibugs>	 06Operations, 06Parsing-Team, 06Release-Engineering-Team, 07HHVM, and 3 others: API cluster failure / OOM - https://phabricator.wikimedia.org/T151702#2835239 (10Joe) I have set arenas for jemalloc to be equal to the number of processors seen by the OS, the bandaid fix should be in the process of being remo...
[16:17:46] <icinga-wm>	 PROBLEM - HHVM jobrunner on mw1167 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 473 bytes in 0.001 second response time
[16:18:46] <icinga-wm>	 RECOVERY - HHVM jobrunner on mw1167 is OK: HTTP OK: HTTP/1.1 200 OK - 202 bytes in 0.002 second response time
[16:18:48] <_joe_>	 !log rolling upgrade of HHVM on the jobrunner, terbium/tin/wasat/mira
[16:18:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:19:35] <grrrit-wm>	 (03PS6) 10Andrew Bogott: bigbrother: Rewrite as python script [puppet] - 10https://gerrit.wikimedia.org/r/309216 (https://phabricator.wikimedia.org/T144955) (owner: 10BryanDavis) 
[16:19:56] <icinga-wm>	 PROBLEM - HHVM jobrunner on mw1303 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 473 bytes in 0.001 second response time
[16:20:56] <icinga-wm>	 RECOVERY - HHVM jobrunner on mw1303 is OK: HTTP OK: HTTP/1.1 200 OK - 202 bytes in 0.001 second response time
[16:21:00] <wikibugs>	 06Operations, 06Parsing-Team, 06Release-Engineering-Team, 07HHVM, and 3 others: API cluster failure / OOM - https://phabricator.wikimedia.org/T151702#2835255 (10Joe) So, with the HHVM part "solved" we still should take the prevention measures I named here:  - Check the concurrency/retry/timeout rates of al...
[16:22:06] <wikibugs>	 06Operations, 06Analytics-Kanban, 10hardware-requests: stat1001 replacement box in eqiad - https://phabricator.wikimedia.org/T149911#2835257 (10RobH) Correct, it was installed in the internal vlan, my bad!  It'll need reinstallation, as well as the dns and network port being updated.
[16:24:26] <icinga-wm>	 PROBLEM - HHVM jobrunner on mw1299 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 473 bytes in 0.046 second response time
[16:25:01] <grrrit-wm>	 (03PS2) 10Addshore: DNM config for ElectronPdfService on beta sites [mediawiki-config] - 10https://gerrit.wikimedia.org/r/322086 (https://phabricator.wikimedia.org/T150945) 
[16:25:26] <icinga-wm>	 RECOVERY - HHVM jobrunner on mw1299 is OK: HTTP OK: HTTP/1.1 200 OK - 202 bytes in 0.005 second response time
[16:29:04] <icinga-wm>	 PROBLEM - HHVM jobrunner on mw1162 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 473 bytes in 0.001 second response time
[16:30:04] <icinga-wm>	 RECOVERY - HHVM jobrunner on mw1162 is OK: HTTP OK: HTTP/1.1 200 OK - 202 bytes in 0.002 second response time
[16:30:29] <grrrit-wm>	 (03CR) 10Andrew Bogott: [C: 032] bigbrother: Rewrite as python script [puppet] - 10https://gerrit.wikimedia.org/r/309216 (https://phabricator.wikimedia.org/T144955) (owner: 10BryanDavis) 
[16:31:04] <icinga-wm>	 PROBLEM - HHVM jobrunner on mw1169 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 473 bytes in 0.001 second response time
[16:31:44] <icinga-wm>	 PROBLEM - cassandra-c CQL 10.192.48.70:9042 on restbase2012 is CRITICAL: connect to address 10.192.48.70 and port 9042: Connection refused
[16:32:04] <icinga-wm>	 RECOVERY - HHVM jobrunner on mw1169 is OK: HTTP OK: HTTP/1.1 200 OK - 202 bytes in 0.002 second response time
[16:34:07] <grrrit-wm>	 (03PS6) 10Paladox: Phabricator: Allow us to change the default web domain [puppet] - 10https://gerrit.wikimedia.org/r/324408 
[16:34:12] <grrrit-wm>	 (03PS7) 10Paladox: Phabricator: Allow us to change the default web domain [puppet] - 10https://gerrit.wikimedia.org/r/324408 
[16:34:19] <wikibugs_>	 06Operations, 06Analytics-Kanban, 10hardware-requests: stat1001 replacement box in eqiad - https://phabricator.wikimedia.org/T149911#2835285 (10Ottomata) ​Ok, it can be reinstalled at will.  The puppet that is in place is fine (it might fail on the first run). Let me know when it is back up and I will make s...
[16:35:09] <grrrit-wm>	 (03PS1) 10Jcrespo: mariadb: Depool es1012 for maintenance and upgrade [mediawiki-config] - 10https://gerrit.wikimedia.org/r/324481 (https://phabricator.wikimedia.org/T151995) 
[16:36:04] <icinga-wm>	 PROBLEM - HHVM jobrunner on mw1301 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 473 bytes in 0.001 second response time
[16:36:26] <grrrit-wm>	 (03PS1) 10ArielGlenn: pick up privatewikis fact from mediawiki config file [puppet/mariadb] - 10https://gerrit.wikimedia.org/r/324482 
[16:36:44] <icinga-wm>	 PROBLEM - puppet last run on mw1306 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 2 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[hhvm],Package[hhvm-dbg]
[16:37:04] <icinga-wm>	 RECOVERY - HHVM jobrunner on mw1301 is OK: HTTP OK: HTTP/1.1 200 OK - 202 bytes in 0.001 second response time
[16:37:17] <grrrit-wm>	 (03PS1) 10Cmjohnson: Revert "Adding mgmt dns entries for restabse1016-1018 T150964" [dns] - 10https://gerrit.wikimedia.org/r/324484 
[16:38:30] <paladox>	 gerrit.wm.org is slow for me
[16:38:35] <paladox>	 mutante apergos ^^
[16:38:58] <paladox>	 google loads fine so it's not mine internet
[16:38:58] <icinga-wm>	 ACKNOWLEDGEMENT - cassandra-c CQL 10.192.48.70:9042 on restbase2012 is CRITICAL: connect to address 10.192.48.70 and port 9042: Connection refused eevans Bootstrapping
[16:39:04] <icinga-wm>	 PROBLEM - HHVM jobrunner on mw1305 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 473 bytes in 0.001 second response time
[16:39:08] <paladox>	 ostriches ^^
[16:39:15] <_joe_>	 paladox: same issue here
[16:39:25] <paladox>	 I think this is gc again
[16:39:34] <apergos>	 slow here also
[16:39:49] <_joe_>	 paladox: I'll asbstain from guessing
[16:40:04] <icinga-wm>	 RECOVERY - HHVM jobrunner on mw1305 is OK: HTTP OK: HTTP/1.1 200 OK - 202 bytes in 0.001 second response time
[16:40:20] <paladox>	 Ok
[16:41:11] <mutante>	 2016-11-30T16:40:33.633+0000: 585655.771: [GC (Allocation Failure)
[16:41:27] <paladox>	 cpu looks to be very high on https://ganglia.wikimedia.org/latest/?r=hour&cs=&ce=&m=cpu_report&c=Miscellaneous+eqiad&h=cobalt.wikimedia.org&tab=m&vn=&hide-hf=false&mc=2&z=medium&metric_group=NOGROUPS_%7C_network
[16:41:43] <grrrit-wm>	 (03Abandoned) 10Cmjohnson: Revert "Adding mgmt dns entries for restabse1016-1018 T150964" [dns] - 10https://gerrit.wikimedia.org/r/324484 (owner: 10Cmjohnson) 
[16:41:58] * cwd too: gerrit.wikimedia.org took too long to respond.
[16:42:43] <jynus>	 we are looking at it
[16:42:56] <mutante>	 root@cobalt:/var/lib/gerrit2/review_site/logs# tail -f error_log
[16:43:03] <mutante>	 	at org.eclipse.jgit.transport.UploadPack.sendPack(UploadPack.java:1391)
[16:43:16] <ostriches>	 On it
[16:43:19] <mutante>	 cool
[16:43:36] <apergos>	 gc logs look ok actually
[16:43:41] <apergos>	 what else we have going on?
[16:43:44] <icinga-wm>	 PROBLEM - DPKG on mw1168 is CRITICAL: DPKG CRITICAL dpkg reports broken packages
[16:43:45] <grrrit-wm>	 (03PS3) 10Addshore: Enable ElectronPdfService extension on beta sites [mediawiki-config] - 10https://gerrit.wikimedia.org/r/322086 (https://phabricator.wikimedia.org/T150945) 
[16:43:47] <grrrit-wm>	 (03PS1) 10Addshore: Enable ElectronPdfService extension on test wikis & mediawikiwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/324487 (https://phabricator.wikimedia.org/T150944) 
[16:43:48] <mutante>	 confirming that, gc log looks like it was fast
[16:43:52] <grrrit-wm>	 (03PS1) 10Addshore: Enable ElectronPdfService extension on metawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/324488 (https://phabricator.wikimedia.org/T150943) 
[16:43:54] <grrrit-wm>	 (03PS1) 10Addshore: Enable ElectronPdfService extension on dewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/324489 (https://phabricator.wikimedia.org/T150942) 
[16:44:06] <_joe_>	 it's failing its gc cycles, quite simply
[16:44:18] <_joe_>	 apergos: no they don't
[16:44:44] <icinga-wm>	 RECOVERY - DPKG on mw1168 is OK: All packages OK
[16:44:57] <apergos>	 it doesn't seem to be pausing for a long time for any of the cycles
[16:45:07] <_joe_>	 -Xmx28g
[16:45:10] <apergos>	 and there aren't very many of these cycles in the last few minutes
[16:45:26] <mutante>	 this feels different from the former gc slowdowns
[16:45:39] <apergos>	 _joe_:  what is it you are seeing?
[16:45:40] <_joe_>	 uhm no that's actually ok
[16:45:42] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Phabricator: Allow us to change the default web domain [puppet] - 10https://gerrit.wikimedia.org/r/324408 (owner: 10Paladox) 
[16:45:44] <mutante>	 and there is the stuff about findObjectsToPackUsingBitmaps(PackWriter.java:1822
[16:45:47] <mutante>	 jgit
[16:45:54] <ostriches>	 That's more concerning
[16:45:55] <paladox>	 jgit again
[16:46:09] <paladox>	 is it missing objects again?
[16:46:29] <ostriches>	 Hmmm
[16:46:33] <mutante>	 well, it did something to "find objects" and then it crashed ?
[16:46:40] <paladox>	 Oh
[16:47:05] <ostriches>	 The missing objects triggered by upload-pack are spammy, but harmless, effectively.
[16:47:05] <paladox>	 I guess it may be finding that object that we had a problem with at the weekend?
[16:47:14] <ostriches>	 But trying to find object to pack using bitmaps seems bad.
[16:47:25] <grrrit-wm>	 (03PS8) 10Paladox: Phabricator: Allow us to change the default web domain [puppet] - 10https://gerrit.wikimedia.org/r/324408 
[16:47:55] <mutante>	 there a new error popped up
[16:48:09] <mutante>	 Internal error during upload-pack 
[16:48:21] <mutante>	 Missing commit
[16:48:42] <paladox>	 which commit is missing?
[16:49:10] <mutante>	 89af503db5298364bd77b2ecf997f0a88edc67e2
[16:49:32] <grrrit-wm>	 (03CR) 10Jcrespo: "Yes, this with https://gerrit.wikimedia.org/r/324153 + a few changes on the template will work." (031 comment) [puppet/mariadb] - 10https://gerrit.wikimedia.org/r/324482 (owner: 10ArielGlenn) 
[16:49:34] <ostriches>	 upload-pack is spammy but harmless
[16:49:37] <ostriches>	 Ignore it, red herring
[16:49:44] <ostriches>	 sudo lsof -u gerrit2 | wc -l
[16:49:44] <ostriches>	 4565
[16:49:48] <ostriches>	 ^ That's more interesting.
[16:49:58] <ostriches>	 Tons of low-traffic repos open by gerrit.
[16:50:15] <paladox>	 something that is a similar http://stackoverflow.com/questions/34654723/gerrit-to-gerrit-replication-issue
[16:50:25] <paladox>	 just i carn't see if they had performance issues when it happened
[16:50:42] <ostriches>	 That's not even remotely related.
[16:50:43] <grrrit-wm>	 (03CR) 10Giuseppe Lavagetto: docker: add package provider (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/323815 (owner: 10Giuseppe Lavagetto) 
[16:50:47] <_joe_>	 akosiaris: ^^
[16:51:10] <jynus>	 ostriches, is it safe to use gerrit to deploy to scap right now or should I wait?
[16:51:29] <mutante>	 btw, it's working again for me
[16:51:35] <mutante>	 and the load is totally down now
[16:51:38] <paladox>	 ostriches would that be "•Prevent double closing of repository when merging changes."
[16:51:44] <ostriches>	 No.
[16:51:49] <paladox>	 ok
[16:52:10] <ostriches>	 jynus: Gimmie just another minute or two
[16:52:42] <jynus>	 sure
[16:55:19] <ostriches>	 jynus: Go ahead.
[16:56:05] <grrrit-wm>	 (03CR) 10Chad: [C: 031] "Actually let's land this today." [puppet] - 10https://gerrit.wikimedia.org/r/323655 (https://phabricator.wikimedia.org/T151676) (owner: 10Reedy) 
[16:56:44] <wikibugs_>	 06Operations, 10Analytics, 10Analytics-Cluster, 10Traffic: Enable Kafka native TLS in 0.9 and secure the kafka traffic with it - https://phabricator.wikimedia.org/T121561#2835376 (10Ottomata)
[16:56:51] <apergos>	 +1 on the disable
[16:57:48] <ostriches>	 The load spikes caused by auto-gcs are far worse than the savings we get from running gc's
[16:57:57] <ostriches>	 stupid jgit. i hates u
[16:58:01] <apergos>	 :-d
[16:58:22] <apergos>	 haters gonna hate
[16:58:38] <paladox>	 thats just a bug and should be fixed in the next update.
[16:58:50] <paladox>	 unless another bug peaks in
[16:58:58] <grrrit-wm>	 (03PS2) 10Dzahn: Disable git gc as source of breakages [puppet] - 10https://gerrit.wikimedia.org/r/323655 (https://phabricator.wikimedia.org/T151676) (owner: 10Reedy) 
[16:59:09] <mutante>	 +1 doing
[16:59:12] <grrrit-wm>	 (03CR) 10Jcrespo: pick up privatewikis fact from mediawiki config file (031 comment) [puppet/mariadb] - 10https://gerrit.wikimedia.org/r/324482 (owner: 10ArielGlenn) 
[16:59:20] <ostriches>	 I probably won't ever re-enable it. I don't trust jgit gc.
[16:59:23] <mutante>	 what all of you said
[16:59:25] <ostriches>	 Fix a bug: 3 more appear.
[16:59:46] <paladox>	 yep
[16:59:50] <grrrit-wm>	 (03CR) 10Jcrespo: [C: 032] mariadb: Depool es1012 for maintenance and upgrade [mediawiki-config] - 10https://gerrit.wikimedia.org/r/324481 (https://phabricator.wikimedia.org/T151995) (owner: 10Jcrespo) 
[17:00:59] <grrrit-wm>	 (03PS3) 10Dzahn: gerrit: Disable git gc as source of breakages [puppet] - 10https://gerrit.wikimedia.org/r/323655 (https://phabricator.wikimedia.org/T151676) (owner: 10Reedy) 
[17:01:06] <grrrit-wm>	 (03CR) 10Dzahn: [C: 032 V: 032] gerrit: Disable git gc as source of breakages [puppet] - 10https://gerrit.wikimedia.org/r/323655 (https://phabricator.wikimedia.org/T151676) (owner: 10Reedy) 
[17:01:46] <logmsgbot>	 !log jynus@tin Synchronized wmf-config/db-eqiad.php: Depool es1012 (duration: 00m 53s)
[17:01:54] <apergos>	 woo hoo
[17:01:57] <apergos>	 kill it dead
[17:01:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:02:25] <mutante>	 !log gerrit restarting to disable gc, config change 323655)
[17:02:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:03:20] <mutante>	 done
[17:03:44] <icinga-wm>	 RECOVERY - puppet last run on mw1306 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures
[17:03:52] <grrrit-wm>	 (03PS2) 10ArielGlenn: pick up privatewikis fact from mediawiki config file [puppet/mariadb] - 10https://gerrit.wikimedia.org/r/324482 
[17:04:33] <wikibugs_>	 06Operations, 10ops-eqiad, 13Patch-For-Review: eqiad: Rack and setup new restbase nodes - https://phabricator.wikimedia.org/T150964#2835404 (10Cmjohnson) All 3 restbase servers are racked and have mgmt access. I did not do production dns.   Switch config is completed as well. I set them up for 3 production c...
[17:04:45] <grrrit-wm>	 (03CR) 10ArielGlenn: pick up privatewikis fact from mediawiki config file (032 comments) [puppet/mariadb] - 10https://gerrit.wikimedia.org/r/324482 (owner: 10ArielGlenn) 
[17:05:14] <icinga-wm>	 PROBLEM - puppet last run on kafka2003 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_mediawiki/event-schemas]
[17:07:21] <wikibugs>	 06Operations, 10Wikimedia-Extension-setup, 07I18n: Deploy IDS rendering engine to production - https://phabricator.wikimedia.org/T148693#2835414 (10Arthur2e5) > but even identifiers are in Chinese in some places  There were actually plans to use Chinese class & filenames in the java servlet (halted due to co...
[17:10:52] <grrrit-wm>	 (03CR) 10Jcrespo: "We should now confine this to the mariadb::sanitarium and mariadb::sanitarium2 roles." [puppet/mariadb] - 10https://gerrit.wikimedia.org/r/324482 (owner: 10ArielGlenn) 
[17:12:06] <grrrit-wm>	 (03CR) 10Dzahn: [C: 04-1] "i think this should be done in the role instead where $domain and $altdom are being set" [puppet] - 10https://gerrit.wikimedia.org/r/324408 (owner: 10Paladox) 
[17:13:14] <grrrit-wm>	 (03CR) 10Dzahn: "i mean, i see this is also "in the role" but further up we just set $domain which is then being used in the "base uri" string." [puppet] - 10https://gerrit.wikimedia.org/r/324408 (owner: 10Paladox) 
[17:16:40] <wikibugs_>	 06Operations, 06Parsing-Team, 06Release-Engineering-Team, 07HHVM, and 3 others: API cluster failure / OOM - https://phabricator.wikimedia.org/T151702#2835456 (10greg) >>! In T151702#2835255, @Joe wrote: > So, with the HHVM part "solved" we still should take the prevention measures I named here: >  > - Chec...
[17:20:21] <grrrit-wm>	 (03PS9) 10Paladox: Phabricator: Allow us to change the default web domain [puppet] - 10https://gerrit.wikimedia.org/r/324408 
[17:20:25] <grrrit-wm>	 (03PS10) 10Paladox: Phabricator: Allow us to change the default web domain [puppet] - 10https://gerrit.wikimedia.org/r/324408 
[17:20:53] <jynus>	 !log mysql restart and general upgrade for es1012 T151995
[17:21:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:21:04] <stashbot>	 T151995: Rolling restart of external storage servers for TLS certificate update - https://phabricator.wikimedia.org/T151995
[17:21:54] <icinga-wm>	 PROBLEM - Debian mirror in sync with upstream on sodium is CRITICAL: /srv/mirrors/debian is over 14 hours old.
[17:22:04] <icinga-wm>	 PROBLEM - puppet last run on ms-be1007 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[17:29:52] <wikibugs_>	 06Operations, 06Parsing-Team, 06Release-Engineering-Team, 07HHVM, and 3 others: API cluster failure / OOM - https://phabricator.wikimedia.org/T151702#2835543 (10Joe) @greg yeah I know, I'll do my homework, promised :)  I'm just waiting to see if the issue happens again in the next couple of days before clo...
[17:33:14] <icinga-wm>	 RECOVERY - puppet last run on kafka2003 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures
[17:37:14] <grrrit-wm>	 (03PS1) 10Chad: ExtDist: REL1_28 default, REL1_29 added (commented), REL1_26 removed [mediawiki-config] - 10https://gerrit.wikimedia.org/r/324505 
[17:39:19] <grrrit-wm>	 (03CR) 10Chad: [C: 032] ExtDist: REL1_28 default, REL1_29 added (commented), REL1_26 removed [mediawiki-config] - 10https://gerrit.wikimedia.org/r/324505 (owner: 10Chad) 
[17:39:54] <grrrit-wm>	 (03Merged) 10jenkins-bot: ExtDist: REL1_28 default, REL1_29 added (commented), REL1_26 removed [mediawiki-config] - 10https://gerrit.wikimedia.org/r/324505 (owner: 10Chad) 
[17:43:24] <grrrit-wm>	 (03CR) 10Addshore: "Repo requested at https://www.mediawiki.org/w/index.php?title=Git/New_repositories/Requests/Entries&diff=2298621&oldid=2298115" [puppet] - 10https://gerrit.wikimedia.org/r/322220 (https://phabricator.wikimedia.org/T147328) (owner: 10Addshore) 
[17:44:31] <grrrit-wm>	 (03CR) 10ArielGlenn: "Hm, so apparently facts are available on all nodes, whether or not the module itself is applied there. Maybe we would be better off with " [puppet/mariadb] - 10https://gerrit.wikimedia.org/r/324482 (owner: 10ArielGlenn) 
[17:45:05] <logmsgbot>	 !log demon@tin Synchronized wmf-config/CommonSettings.php: extdist stuffs (duration: 00m 46s)
[17:45:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:46:04] <icinga-wm>	 PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL ERROR - Redis Library - can not ping 10.192.48.44 on port 6479
[17:47:04] <icinga-wm>	 RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS 2.8.17 on 10.192.48.44:6479 has 1 databases (db0) with 4757321 keys, up 30 days 9 hours - replication_delay is 0
[17:51:04] <icinga-wm>	 RECOVERY - puppet last run on ms-be1007 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures
[18:05:36] <grrrit-wm>	 (03PS1) 10Jcrespo: Revert "mariadb: Depool es1012 for maintenance and upgrade" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/324512 
[18:11:51] <wikibugs>	 06Operations, 10Traffic, 13Patch-For-Review, 05Prometheus-metrics-monitoring: Error collecting metrics from varnish_exporter on some misc hosts - https://phabricator.wikimedia.org/T150479#2835710 (10fgiunchedi) @ema indeed the new upstream version should fix the errors. What's left to figure out is what to...
[18:13:16] <grrrit-wm>	 (03CR) 10Jcrespo: "I am not sure I want this running on all all boxes, and getting random facts whenever that file randomly exist for other reasons. Maybe so" [puppet/mariadb] - 10https://gerrit.wikimedia.org/r/324482 (owner: 10ArielGlenn) 
[18:15:26] <grrrit-wm>	 (03CR) 10Jcrespo: [C: 032] Revert "mariadb: Depool es1012 for maintenance and upgrade" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/324512 (owner: 10Jcrespo) 
[18:17:24] <icinga-wm>	 PROBLEM - Disk space on labtestnet2001 is CRITICAL: DISK CRITICAL - free space: / 350 MB (3% inode=46%)
[18:19:06] <logmsgbot>	 !log jynus@tin Synchronized wmf-config/db-eqiad.php: Repool es1012 (duration: 00m 46s)
[18:19:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:22:28] <icinga-wm>	 ACKNOWLEDGEMENT - Check for gridmaster host resolution TCP on labtest-ns0.wikimedia.org is CRITICAL: DNS CRITICAL - 0.160 seconds response time (No ANSWER SECTION found) daniel_zahn duration 200 days
[18:22:29] <icinga-wm>	 ACKNOWLEDGEMENT - Check for gridmaster host resolution UDP on labtest-ns0.wikimedia.org is CRITICAL: DNS CRITICAL - 0.114 seconds response time (No ANSWER SECTION found) daniel_zahn duration 200 days
[18:22:53] <wikibugs>	 06Operations, 10ops-eqiad, 13Patch-For-Review: eqiad: Rack and setup new restbase nodes - https://phabricator.wikimedia.org/T150964#2835751 (10fgiunchedi) >>! In T150964#2835404, @Cmjohnson wrote: > All 3 restbase servers are racked and have mgmt access. I did not do production dns.   Switch config is comple...
[18:24:41] <grrrit-wm>	 (03PS1) 10Jcrespo: mariadb: Depool es1013 for maintenance and general upgrade [mediawiki-config] - 10https://gerrit.wikimedia.org/r/324518 (https://phabricator.wikimedia.org/T151995) 
[18:24:44] <icinga-wm>	 PROBLEM - puppet last run on ms-fe3001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[18:25:24] <icinga-wm>	 RECOVERY - Disk space on labtestnet2001 is OK: DISK OK
[18:25:51] <mutante>	 !log labnet2001 - ran low on disk, gzipped large /var/log/upstart/nova-api.log.1 / apt-get clean 
[18:26:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:26:04] <mutante>	 labtestnet2001..fixing in wiki
[18:27:20] <mutante>	 !log last log message was about "labtestnet2001" not "labnet2001"
[18:27:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:28:25] <grrrit-wm>	 (03CR) 10Jcrespo: [C: 032] mariadb: Depool es1013 for maintenance and general upgrade [mediawiki-config] - 10https://gerrit.wikimedia.org/r/324518 (https://phabricator.wikimedia.org/T151995) (owner: 10Jcrespo) 
[18:29:54] <icinga-wm>	 PROBLEM - YARN NodeManager Node-State on analytics1039 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[18:29:59] <wikibugs_>	 06Operations, 10Phabricator: iridium / filesystem almost full - https://phabricator.wikimedia.org/T150396#2835766 (10mmodell) hmm... I'm not sure what's up with that. Can we just mount it as tmpfs?
[18:30:44] <icinga-wm>	 RECOVERY - YARN NodeManager Node-State on analytics1039 is OK: OK: YARN NodeManager analytics1039.eqiad.wmnet:8041 Node-State: RUNNING
[18:31:33] <grrrit-wm>	 (03PS2) 10Jcrespo: mariadb: Depool es1013 for maintenance and general upgrade [mediawiki-config] - 10https://gerrit.wikimedia.org/r/324518 (https://phabricator.wikimedia.org/T151995) 
[18:36:44] <grrrit-wm>	 (03CR) 10Jcrespo: [C: 032 V: 032] mariadb: Depool es1013 for maintenance and general upgrade [mediawiki-config] - 10https://gerrit.wikimedia.org/r/324518 (https://phabricator.wikimedia.org/T151995) (owner: 10Jcrespo) 
[18:39:24] <logmsgbot>	 !log jynus@tin Synchronized wmf-config/db-eqiad.php: Depool es1013 (duration: 00m 45s)
[18:39:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:42:25] <wikibugs>	 06Operations, 10Phabricator: iridium / filesystem almost full - https://phabricator.wikimedia.org/T150396#2835781 (10Dzahn) Does Phabricator have a config option where to store the temp files?
[18:43:07] <jynus>	 !log mysql restart and general upgrade for es1013 T151995
[18:43:18] <grrrit-wm>	 (03PS1) 10RobH: thorium should be in analytics vlan [dns] - 10https://gerrit.wikimedia.org/r/324527 
[18:43:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:43:20] <stashbot>	 T151995: Rolling restart of external storage servers for TLS certificate update - https://phabricator.wikimedia.org/T151995
[18:44:05] <grrrit-wm>	 (03CR) 10RobH: [C: 032] thorium should be in analytics vlan [dns] - 10https://gerrit.wikimedia.org/r/324527 (owner: 10RobH) 
[18:48:06] <wikibugs>	 06Operations, 10Phabricator: iridium / filesystem almost full - https://phabricator.wikimedia.org/T150396#2835806 (10Paladox) I think it is https://secure.phabricator.com/book/phabricator/article/configuring_file_storage/  "Engine: Local Disk"
[18:51:13] <wikibugs>	 06Operations, 10Icinga, 06Labs, 10Labs-Infrastructure: remove/fix "Check for gridmaster host resolution" Icinga check for "labtest" - https://phabricator.wikimedia.org/T152024#2835831 (10Dzahn) p:05Triage>03Low
[18:52:04] <icinga-wm>	 ACKNOWLEDGEMENT - Check for gridmaster host resolution TCP on labtest-ns0.wikimedia.org is CRITICAL: DNS CRITICAL - 0.160 seconds response time (No ANSWER SECTION found) daniel_zahn https://phabricator.wikimedia.org/T152024
[18:52:04] <icinga-wm>	 ACKNOWLEDGEMENT - Check for gridmaster host resolution UDP on labtest-ns0.wikimedia.org is CRITICAL: DNS CRITICAL - 0.114 seconds response time (No ANSWER SECTION found) daniel_zahn https://phabricator.wikimedia.org/T152024
[18:52:44] <icinga-wm>	 RECOVERY - puppet last run on ms-fe3001 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures
[18:52:46] <wikibugs>	 06Operations, 10Icinga, 06Labs, 10Labs-Infrastructure: remove/fix "Check for gridmaster host resolution" Icinga check for "labtest" - https://phabricator.wikimedia.org/T152024#2835820 (10Dzahn)
[18:55:06] <grrrit-wm>	 (03PS1) 10Jcrespo: Revert "mariadb: Depool es1013 for maintenance and general upgrade" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/324530 
[18:55:45] <icinga-wm>	 PROBLEM - puppet last run on lvs3001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[18:59:25] <grrrit-wm>	 (03PS1) 10Filippo Giunchedi: eventlogging: keepLastValue for eventlogging_NavigationTiming [puppet] - 10https://gerrit.wikimedia.org/r/324532 
[18:59:31] <wikibugs_>	 06Operations, 10Phabricator: iridium / filesystem almost full - https://phabricator.wikimedia.org/T150396#2835865 (10Dzahn) Thanks, this sounds like it   ```     storage.local-disk.path: Set to some writable directory on local disk. Make that directory. ```  We could simply use /srv/tmp because there we have h...
[19:00:04] <jouncebot>	 addshore, hashar, anomie, ostriches, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, and thcipriani: Respected human, time to deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20161130T1900). Please do the needful.
[19:01:02] <grrrit-wm>	 (03CR) 10Filippo Giunchedi: [C: 032] templates: add PTR for pdfrender [dns] - 10https://gerrit.wikimedia.org/r/324372 (owner: 10Filippo Giunchedi) 
[19:01:07] <grrrit-wm>	 (03PS2) 10Filippo Giunchedi: templates: add PTR for pdfrender [dns] - 10https://gerrit.wikimedia.org/r/324372 
[19:01:11] <grrrit-wm>	 (03CR) 10Filippo Giunchedi: [V: 032] templates: add PTR for pdfrender [dns] - 10https://gerrit.wikimedia.org/r/324372 (owner: 10Filippo Giunchedi) 
[19:01:20] <grrrit-wm>	 (03PS2) 10Ottomata: Allow misc directors to specify url path conditions as well as Host conditions [puppet] - 10https://gerrit.wikimedia.org/r/322964 
[19:01:42] <grrrit-wm>	 (03PS3) 10Ottomata: Allow misc directors to specify url path conditions as well as Host conditions [puppet] - 10https://gerrit.wikimedia.org/r/322964 
[19:02:02] <grrrit-wm>	 (03CR) 10Ottomata: [C: 031] eventlogging: keepLastValue for eventlogging_NavigationTiming [puppet] - 10https://gerrit.wikimedia.org/r/324532 (owner: 10Filippo Giunchedi) 
[19:02:38] <grrrit-wm>	 (03PS2) 10Filippo Giunchedi: templates: add logstash.svc [dns] - 10https://gerrit.wikimedia.org/r/324373 (https://phabricator.wikimedia.org/T151971) 
[19:02:45] <wikibugs>	 06Operations, 10Phabricator: iridium / filesystem almost full - https://phabricator.wikimedia.org/T150396#2835893 (10Dzahn) Of course there should be some cleanup but we also don't have to make it hard on us by limiting ourselves to this tiny 10G /  where you are constantly fighting to keep the remaining 2G fr...
[19:04:22] <logmsgbot>	 !log demon@tin Synchronized php-1.29.0-wmf.4/includes/specials/SpecialUserrights.php: Ia0e583a5 (duration: 00m 45s)
[19:04:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:05:31] <grrrit-wm>	 (03PS2) 10RobH: thorium should be in analytics vlan [dns] - 10https://gerrit.wikimedia.org/r/324527 
[19:05:33] <Amir1>	 Can I add stuff to SWAT now?
[19:06:04] <grrrit-wm>	 (03CR) 10Filippo Giunchedi: [C: 032] templates: add logstash.svc [dns] - 10https://gerrit.wikimedia.org/r/324373 (https://phabricator.wikimedia.org/T151971) (owner: 10Filippo Giunchedi) 
[19:06:44] <grrrit-wm>	 (03Abandoned) 10RobH: thorium should be in analytics vlan [dns] - 10https://gerrit.wikimedia.org/r/324527 (owner: 10RobH) 
[19:06:54] <grrrit-wm>	 (03PS2) 10Filippo Giunchedi: eventlogging: keepLastValue for eventlogging_NavigationTiming [puppet] - 10https://gerrit.wikimedia.org/r/324532 
[19:09:50] <grrrit-wm>	 (03PS1) 10RobH: thorium vlan change [dns] - 10https://gerrit.wikimedia.org/r/324537 
[19:10:21] <grrrit-wm>	 (03CR) 10RobH: [C: 032] thorium vlan change [dns] - 10https://gerrit.wikimedia.org/r/324537 (owner: 10RobH) 
[19:10:24] <icinga-wm>	 PROBLEM - Disk space on stat1002 is CRITICAL: DISK CRITICAL - free space: /home 40583 MB (3% inode=98%)
[19:11:13] <godog>	 sigh, ci/jenkins backed up again, a lot of jobs in the queue
[19:11:59] <mutante>	 godog: nodepool
[19:12:08] <mutante>	 "leaking instances"
[19:12:11] <mutante>	 and stuff
[19:12:56] <godog>	 mhh that but also an influx of jobs backs things up, I think it is from core
[19:13:09] <mutante>	 08:09 <paladox> hashar migrated mediawiki-extensions-* tests to nodepool today
[19:13:12] <mutante>	 08:10 <paladox> but nodepool did get a bump in resources
[19:13:20] <mutante>	 the extensions
[19:13:23] <mutante>	 afaict
[19:14:38] <paladox>	 it could be a leak or all the nodepool instances have reached it's creation limit
[19:14:56] <grrrit-wm>	 (03CR) 10Filippo Giunchedi: [C: 032] eventlogging: keepLastValue for eventlogging_NavigationTiming [puppet] - 10https://gerrit.wikimedia.org/r/324532 (owner: 10Filippo Giunchedi) 
[19:15:07] <paladox>	 mutante godog ^^
[19:16:05] <godog>	 eventually it flushes the queue of jobs heh so it makes progress, just that when hit with many jobs it takes a while to process all
[19:16:25] <paladox>	 At peak times normaly around now it will get slow
[19:16:27] <godog>	 also afaict no priority among jobs so there can be starvation
[19:16:59] <paladox>	 mutante mediawiki-extensions havent actually been added to the test pipeline just they have been created
[19:18:04] <mutante>	 paladox: ah! ok
[19:18:27] <paladox>	 yep
[19:19:18] <grrrit-wm>	 (03PS1) 10Jcrespo: mariadb: depool db1017 for maintenance and general upgrade [mediawiki-config] - 10https://gerrit.wikimedia.org/r/324539 (https://phabricator.wikimedia.org/T151995) 
[19:19:19] <wikibugs>	 06Operations, 10Icinga, 06Labs, 10Labs-Infrastructure: remove/fix "Check for gridmaster host resolution" Icinga check for "labtest" - https://phabricator.wikimedia.org/T152024#2835945 (10Krenair) "gridmaster host resolution" is a tools project specific thing, why is it even in icinga instead of shinken?
[19:19:28] <paladox>	 mutante apparently you can disable
[19:19:31] <paladox>	 the storage engine
[19:19:33] <paladox>	 https://github.com/wikimedia/phabricator/blob/bf75469a3427f7b9bab9628f6c6a62ec8f7e7f1f/src/applications/files/config/PhabricatorFilesConfigOptions.php#L176
[19:19:39] <paladox>	 on phabricator
[19:20:09] <paladox>	 It seems to be null already
[19:20:16] <paladox>	 So we must have it set some where in puppet
[19:21:10] <grrrit-wm>	 (03PS1) 10Ladsgroup: Add 'softest' values for ores [mediawiki-config] - 10https://gerrit.wikimedia.org/r/324540 (https://phabricator.wikimedia.org/T150224) 
[19:21:24] <paladox>	 But i carn't find it in puppet
[19:22:49] <Amir1>	 Anyone around to do swat? thcipriani ?
[19:24:54] <icinga-wm>	 RECOVERY - puppet last run on lvs3001 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures
[19:25:31] <thcipriani>	 yup, I'm around. Didn't see anything earlier...
[19:25:41] <paladox>	 twentyafterfour why is mysql limit small on https://phabricator.wikimedia.org/applications/view/PhabricatorFilesApplication/
[19:25:42] <paladox>	 ?
[19:26:31] <wikibugs_>	 06Operations, 10Phabricator: iridium / filesystem almost full - https://phabricator.wikimedia.org/T150396#2835964 (10fgiunchedi) @mmodell tmpfs won't help because it doesn't clean up old files by itself. @dzahn @Paladox yeah that might help if it actually gets honoured. My thought on that was that if all phab...
[19:27:04] <Amir1>	 thcipriani: https://gerrit.wikimedia.org/r/324540
[19:27:16] <Amir1>	 and https://gerrit.wikimedia.org/r/324541 
[19:27:27] <Amir1>	 before wmf.4 gets wikidata 
[19:27:38] <Amir1>	 it would be great
[19:27:59] <thcipriani>	 Amir1: sure I can get those out. Could you add them to the deployments page?
[19:28:09] <Amir1>	 Doing it right now
[19:28:13] <wikibugs>	 06Operations, 10Phabricator: iridium / filesystem almost full - https://phabricator.wikimedia.org/T150396#2835971 (10Paladox) @fgiunchedi hi, looking at it more it says we should be using MySQL by default so we may have a bug in that since it should not have set a local disk path as I see no setting has been s...
[19:28:22] <wikibugs_>	 06Operations, 10DBA: Rolling restart of parsercache servers for TLS certificate update - https://phabricator.wikimedia.org/T152029#2835972 (10jcrespo)
[19:28:47] <thcipriani>	 cool thanks :)
[19:29:09] <wikibugs>	 06Operations, 10netops: Enabling IGMP snooping on QFX switches breaks IPv6 (HTCP purges flood across codfw) - https://phabricator.wikimedia.org/T133387#2835986 (10faidon) JTAC thinks this may be [[  https://prsearch.juniper.net/InfoCenter/index?page=prcontent&id=PR957108 | PR957108 ]]: ``` Title: IPv6 neighbor...
[19:29:32] <grrrit-wm>	 (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/324540 (https://phabricator.wikimedia.org/T150224) (owner: 10Ladsgroup) 
[19:29:43] <wikibugs_>	 06Operations, 10DBA: Rolling restart of parsercache servers for TLS certificate update - https://phabricator.wikimedia.org/T152029#2835972 (10jcrespo) p:05Normal>03High
[19:29:58] <wikibugs>	 06Operations, 10Analytics: Install java 8 to stat1002 - https://phabricator.wikimedia.org/T151896#2835991 (10Ottomata) Hm hm.  openjdk-7-jdk is required in a few classes that get included on stat1002.  As long as the default alternative isn't updated as consequence of doing a `require_package('openjdk-8-jdk'0)...
[19:30:05] <grrrit-wm>	 (03Merged) 10jenkins-bot: Add 'softest' values for ores [mediawiki-config] - 10https://gerrit.wikimedia.org/r/324540 (https://phabricator.wikimedia.org/T150224) (owner: 10Ladsgroup) 
[19:30:28] <grrrit-wm>	 (03PS2) 10Ottomata: Add a reportupdater job for ee-migration [puppet] - 10https://gerrit.wikimedia.org/r/324466 (https://phabricator.wikimedia.org/T126358) (owner: 10Mforns) 
[19:31:10] <thcipriani>	 Amir1: https://gerrit.wikimedia.org/r/#/c/324540/1 is live on mwdebug1002 if there is anything to check there
[19:31:26] <grrrit-wm>	 (03PS1) 10Jcrespo: mariadb: Upgrade parsercache servers to use the puppet TLS cert [puppet] - 10https://gerrit.wikimedia.org/r/324542 (https://phabricator.wikimedia.org/T152029) 
[19:31:28] <Amir1>	 yeah, let me check
[19:32:58] <Amir1>	 thcipriani: the config one works like a charm
[19:33:05] <thcipriani>	 Amir1: ok, going live
[19:33:48] <grrrit-wm>	 (03CR) 10Jcrespo: [C: 032] mariadb: Upgrade parsercache servers to use the puppet TLS cert [puppet] - 10https://gerrit.wikimedia.org/r/324542 (https://phabricator.wikimedia.org/T152029) (owner: 10Jcrespo) 
[19:34:30] <grrrit-wm>	 (03CR) 10Ottomata: [C: 032] Add a reportupdater job for ee-migration [puppet] - 10https://gerrit.wikimedia.org/r/324466 (https://phabricator.wikimedia.org/T126358) (owner: 10Mforns) 
[19:34:41] <grrrit-wm>	 (03PS3) 10Ottomata: Add a reportupdater job for ee-migration [puppet] - 10https://gerrit.wikimedia.org/r/324466 (https://phabricator.wikimedia.org/T126358) (owner: 10Mforns) 
[19:34:44] <grrrit-wm>	 (03CR) 10Ottomata: [V: 032] Add a reportupdater job for ee-migration [puppet] - 10https://gerrit.wikimedia.org/r/324466 (https://phabricator.wikimedia.org/T126358) (owner: 10Mforns) 
[19:34:51] <logmsgbot>	 !log thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:324540|Add "softest" values for ores]] T150224 (duration: 00m 46s)
[19:34:53] <paladox>	 Someone called this https://phabricator.wikimedia.org/applications/view/PhabricatorFilesApplication/
[19:34:54] <icinga-wm>	 PROBLEM - puppet last run on sca1004 is CRITICAL: CRITICAL: Puppet has 17 failures. Last run 2 minutes ago with 17 failures. Failed resources (up to 3 shown): Service[ferm],Service[diamond],Service[prometheus-node-exporter],Package[ecryptfs-utils]
[19:34:58] <paladox>	 blob store for pokemon
[19:35:01] <thcipriani>	 ^ Amir1 live everywhere now
[19:35:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:35:03] <stashbot>	 T150224: Add "Lowest" ORES sensitivity for fpr=0.1 - https://phabricator.wikimedia.org/T150224
[19:35:07] <paladox>	 mutante twentyafterfour ^^ lol
[19:35:16] <Amir1>	 thcipriani: amazing
[19:35:17] <Amir1>	 thanks
[19:36:06] <thcipriani>	 yw :)
[19:36:28] <thcipriani>	 Amir1: css change is for wmf.4 is live on mwdebug1002
[19:36:37] <thcipriani>	 s/is//
[19:36:54] <Amir1>	 since no wiki in group0 has ores enabled it's not testable 
[19:37:33] <thcipriani>	 ok, will push live
[19:38:09] <Amir1>	 In beta it's great: https://en.wikipedia.beta.wmflabs.org/w/index.php?title=Special:RecentChanges&hidenondamaging=1
[19:38:35] <Amir1>	 One thing. Anyone admin on enwiki to make a mediawiki page for now?
[19:38:53] <Amir1>	 (You can also use the staff account)
[19:39:36] <mutante>	 paladox: hehe, omg..   Pokémon is a trademark  but only with the accent on the e :p
[19:39:48] <paladox>	 Yep lol
[19:39:56] <logmsgbot>	 !log thcipriani@tin Synchronized php-1.29.0-wmf.4/extensions/ORES/modules/ext.ores.styles.css: SWAT: [[gerrit:324541|Use darker shade of yellow]] (duration: 00m 45s)
[19:39:57] <wikibugs_>	 06Operations, 10Traffic, 13Patch-For-Review: python-varnishapi daemons seeing "Log overrun" constantly - https://phabricator.wikimedia.org/T151643#2823504 (10Volans) Quick summary from my tests of today:   - most of the time is spent calling `_callBack()` in varnishapi.py:879`   - regarding the 2 `while 1:`...
[19:40:03] <thcipriani>	 ^ Amir1 css change should be live
[19:40:03] <paladox>	 it is used for storing pokemon according to desc lol
[19:40:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:40:11] <godog>	 odd I just lost my gerrit session
[19:40:13] <grrrit-wm>	 (03CR) 10ArielGlenn: "I see your point, will think about this some more." [puppet/mariadb] - 10https://gerrit.wikimedia.org/r/324482 (owner: 10ArielGlenn) 
[19:40:13] <Amir1>	 thcipriani: thanks :)
[19:40:25] <grrrit-wm>	 (03PS1) 10Jcrespo: mariadb: update my.cnf for parsercache tls implementation [puppet] - 10https://gerrit.wikimedia.org/r/324543 (https://phabricator.wikimedia.org/T152029) 
[19:41:07] <mutante>	 godog: there was a service restart but a little over 2 hours ago
[19:41:58] <grrrit-wm>	 (03PS2) 10Jcrespo: mariadb: update my.cnf for parsercache tls implementation [puppet] - 10https://gerrit.wikimedia.org/r/324543 (https://phabricator.wikimedia.org/T152029) 
[19:43:09] <jynus>	 can I quickly deploy some db pool/depools to mediawiki?
[19:43:30] <grrrit-wm>	 (03CR) 10Jcrespo: [C: 032] mariadb: update my.cnf for parsercache tls implementation [puppet] - 10https://gerrit.wikimedia.org/r/324543 (https://phabricator.wikimedia.org/T152029) (owner: 10Jcrespo) 
[19:43:47] <grrrit-wm>	 (03CR) 10Filippo Giunchedi: prometheus: add vhtcpd stats via node-exporter (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/323559 (https://phabricator.wikimedia.org/T147429) (owner: 10Filippo Giunchedi) 
[19:44:02] <godog>	 mutante: ah thanks, I've already logged in after that, bah
[19:45:57] <godog>	 volans: I'm going to merge the vhtcp patch for now and defer to the task for the filename convention
[19:46:14] <jynus>	 I am going to asume nobody is deploying and going ahead
[19:46:34] <grrrit-wm>	 (03PS2) 10Jcrespo: Revert "mariadb: Depool es1013 for maintenance and general upgrade" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/324530 
[19:46:46] <wikibugs>	 06Operations, 10ops-eqiad: return/replace bad JNP-QSFP- DAC-5M - https://phabricator.wikimedia.org/T152032#2836061 (10RobH)
[19:47:04] <wikibugs_>	 06Operations, 10ops-eqiad: return/replace bad JNP-QSFP- DAC-5M - https://phabricator.wikimedia.org/T152032#2836078 (10RobH) >>! In T149726#2763763, @Cmjohnson wrote: > The cable information  >  > 740-0328625 REV 01 5.0M > MOLEX QFSP+ 1110409057 REV 8 > MOC15506250085 MADE IN CHINA
[19:47:25] <grrrit-wm>	 (03CR) 10Jcrespo: [C: 032] Revert "mariadb: Depool es1013 for maintenance and general upgrade" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/324530 (owner: 10Jcrespo) 
[19:47:38] <grrrit-wm>	 (03PS2) 10Jcrespo: mariadb: depool db1017 for maintenance and general upgrade [mediawiki-config] - 10https://gerrit.wikimedia.org/r/324539 (https://phabricator.wikimedia.org/T151995) 
[19:48:22] <volans>	 godog: ok, but let's find an agreement so that we have 1 way to do that and at least for new thing we stick on it
[19:49:07] <grrrit-wm>	 (03CR) 10Jcrespo: [C: 032] mariadb: depool db1017 for maintenance and general upgrade [mediawiki-config] - 10https://gerrit.wikimedia.org/r/324539 (https://phabricator.wikimedia.org/T151995) (owner: 10Jcrespo) 
[19:50:17] <godog>	 yeah, the underscore/dash will be odd since it is just different conventions, anyways will comment on the task
[19:50:50] <godog>	 that'd be T144169
[19:50:50] <stashbot>	 T144169: Flake8 for python files without extension in puppet repo - https://phabricator.wikimedia.org/T144169
[19:50:54] <logmsgbot>	 !log jynus@tin Synchronized wmf-config/db-eqiad.php: Repool es1013; depool es1017 (duration: 00m 45s)
[19:51:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:52:42] <volans>	 godog: yeah, thanks
[19:53:14] <icinga-wm>	 PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL ERROR - Redis Library - can not ping 10.192.48.44 on port 6479
[19:54:14] <icinga-wm>	 RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS 2.8.17 on 10.192.48.44:6479 has 1 databases (db0) with 4756916 keys, up 30 days 11 hours - replication_delay is 59
[19:55:01] <wikibugs>	 06Operations, 10ops-eqiad: return/replace bad JNP-QSFP- DAC-5M - https://phabricator.wikimedia.org/T152032#2836101 (10RobH) Case ID 2016-1130-0744 has been created for you.
[19:56:31] <grrrit-wm>	 (03PS4) 10Filippo Giunchedi: prometheus: add vhtcpd stats via node-exporter [puppet] - 10https://gerrit.wikimedia.org/r/323559 (https://phabricator.wikimedia.org/T147429) 
[19:57:54] <grrrit-wm>	 (03CR) 10Filippo Giunchedi: [C: 032] prometheus: add vhtcpd stats via node-exporter [puppet] - 10https://gerrit.wikimedia.org/r/323559 (https://phabricator.wikimedia.org/T147429) (owner: 10Filippo Giunchedi) 
[20:00:04] <jouncebot>	 twentyafterfour: Dear anthropoid, the time has come. Please deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20161130T2000).
[20:01:47] <icinga-wm>	 PROBLEM - puppet last run on cp4001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 1 minute ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/bin/prometheus-vhtcpd-stats]
[20:02:13] <godog>	 heh that's me, fix incoming
[20:02:26] <godog>	 I might be able to race puppet fails
[20:02:30] <grrrit-wm>	 (03PS1) 10Filippo Giunchedi: prometheus: add missing 'd' to prometheus-vhtcp-stats [puppet] - 10https://gerrit.wikimedia.org/r/324545 
[20:02:37] <icinga-wm>	 PROBLEM - puppet last run on cp1048 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/bin/prometheus-vhtcpd-stats]
[20:02:53] <grrrit-wm>	 (03CR) 10Catrope: "FIXME: This should not have been deployed yet. Because https://gerrit.wikimedia.org/r/#/c/320328/ hasn't rolled out everywhere, the "softe" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/324540 (https://phabricator.wikimedia.org/T150224) (owner: 10Ladsgroup) 
[20:03:23] <grrrit-wm>	 (03CR) 10Filippo Giunchedi: [C: 032 V: 032] prometheus: add missing 'd' to prometheus-vhtcp-stats [puppet] - 10https://gerrit.wikimedia.org/r/324545 (owner: 10Filippo Giunchedi) 
[20:04:47] <icinga-wm>	 RECOVERY - puppet last run on cp4001 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures
[20:04:47] <icinga-wm>	 PROBLEM - puppet last run on cp4006 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/bin/prometheus-vhtcpd-stats]
[20:04:48] <icinga-wm>	 PROBLEM - puppet last run on cp3049 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/bin/prometheus-vhtcpd-stats]
[20:04:57] <icinga-wm>	 PROBLEM - puppet last run on cp2024 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/bin/prometheus-vhtcpd-stats]
[20:05:24] <aharoni>	 Hallo.
[20:05:47] <icinga-wm>	 PROBLEM - puppet last run on cp4020 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/bin/prometheus-vhtcpd-stats]
[20:05:48] <aharoni>	 Is the train running today for non-Wikipedias + 2 Wikipedias?
[20:05:57] <icinga-wm>	 PROBLEM - puppet last run on cp1047 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/bin/prometheus-vhtcpd-stats]
[20:06:08] <icinga-wm>	 PROBLEM - puppet last run on cp2005 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/bin/prometheus-vhtcpd-stats]
[20:06:47] <icinga-wm>	 PROBLEM - puppet last run on cp1067 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/bin/prometheus-vhtcpd-stats]
[20:06:47] <icinga-wm>	 PROBLEM - puppet last run on cp4012 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/bin/prometheus-vhtcpd-stats]
[20:07:09] <godog>	 that's me, recovering at the next puppet run
[20:07:22] <jynus>	 !log mysql restart and general upgrade for es1017 T151995
[20:07:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:07:33] <stashbot>	 T151995: Rolling restart of external storage servers for TLS certificate update - https://phabricator.wikimedia.org/T151995
[20:09:01] <Krenair>	 aharoni, I think so
[20:09:02] <Krenair>	 jouncebot, next
[20:09:03] <jouncebot>	 In 0 hour(s) and 50 minute(s): Services – Parsoid / OCG / Citoid / Mobileapps / ORES / … (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20161130T2100)
[20:09:06] <Krenair>	 hm
[20:09:27] <icinga-wm>	 PROBLEM - puppet last run on db1052 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[20:09:45] <Krenair>	 aharoni, yeah, now, theoretically
[20:10:37] <MatmaRex>	 jouncebot: now
[20:10:37] <jouncebot>	 For the next 1 hour(s) and 49 minute(s): MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20161130T2000)
[20:13:14] <grrrit-wm>	 (03CR) 10Chad: [C: 032] group1 to wmf.4 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/324475 (owner: 10Chad) 
[20:13:49] <grrrit-wm>	 (03Merged) 10jenkins-bot: group1 to wmf.4 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/324475 (owner: 10Chad) 
[20:13:57] <ostriches>	 all aboard
[20:14:47] <logmsgbot>	 !log demon@tin rebuilt wikiversions.php and synchronized wikiversions files: group1 to wmf.4
[20:15:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:15:28] <wikibugs_>	 06Operations, 10Phabricator: iridium / filesystem almost full - https://phabricator.wikimedia.org/T150396#2836208 (10Dzahn) Adding the cron with a "find" to delete older files won't be hard, but _how_ old is old enough to delete?
[20:15:46] <grrrit-wm>	 (03CR) 10Ladsgroup: "If that patch gets deployed before this one that would cause lots of inconsistency with data (because the default value is actually less t" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/324540 (https://phabricator.wikimedia.org/T150224) (owner: 10Ladsgroup) 
[20:21:27] <wikibugs_>	 06Operations, 10Continuous-Integration-Config, 06Operations-Software-Development: Flake8 for python files without extension in puppet repo - https://phabricator.wikimedia.org/T144169#2590514 (10fgiunchedi) After some discussion in https://gerrit.wikimedia.org/r/#/c/323559/ I've changed my vote to "automatica...
[20:21:43] <grrrit-wm>	 (03PS1) 10Jcrespo: Revert "mariadb: depool db1017 for maintenance and general upgrade" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/324548 
[20:22:09] <grrrit-wm>	 (03CR) 10Jcrespo: [C: 04-2] "Wait for buffer pool warmup." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/324548 (owner: 10Jcrespo) 
[20:23:38] <ostriches>	 jouncebot: I'm done with the train deploy. Gonna get some lunch, you want any?
[20:24:07] <icinga-wm>	 PROBLEM - puppet last run on ocg1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[20:24:33] <godog>	 stoic jouncebot 
[20:29:37] <icinga-wm>	 RECOVERY - puppet last run on cp1048 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures
[20:32:47] <icinga-wm>	 RECOVERY - puppet last run on cp4006 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures
[20:32:47] <icinga-wm>	 RECOVERY - puppet last run on cp3049 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures
[20:32:57] <icinga-wm>	 RECOVERY - puppet last run on cp2024 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures
[20:32:57] <icinga-wm>	 RECOVERY - puppet last run on cp1047 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures
[20:33:12] <grrrit-wm>	 (03Draft1) 10Paladox: Phabricator: Allow us to change the default web domain in apache [puppet] - 10https://gerrit.wikimedia.org/r/324551 
[20:33:14] <grrrit-wm>	 (03Draft2) 10Paladox: Phabricator: Allow us to change the default web domain in apache [puppet] - 10https://gerrit.wikimedia.org/r/324551 
[20:33:36] <wikibugs>	 06Operations, 10Analytics: Install java 8 to stat1002 - https://phabricator.wikimedia.org/T151896#2836283 (10EBernhardson) I am specifically doing some machine learning exploration using RankLib, a java implementation of various ML ranking algorithms (that could perhaps be integrated to an elasticsearch plugin...
[20:33:47] <icinga-wm>	 RECOVERY - puppet last run on cp4020 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures
[20:34:01] <grrrit-wm>	 (03PS11) 10Paladox: Phabricator: Allow us to change the default web domain [puppet] - 10https://gerrit.wikimedia.org/r/324408 
[20:34:06] <grrrit-wm>	 (03PS12) 10Paladox: Phabricator: Allow us to change the default web domain [puppet] - 10https://gerrit.wikimedia.org/r/324408 
[20:34:08] <icinga-wm>	 RECOVERY - puppet last run on cp2005 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures
[20:34:47] <icinga-wm>	 RECOVERY - puppet last run on cp1067 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures
[20:34:47] <icinga-wm>	 RECOVERY - puppet last run on cp4012 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures
[20:35:43] <grrrit-wm>	 (03CR) 10Dzahn: [C: 04-1] "please don't call it "base_uri" when it's not. what you are looking up and then setting is just $domain and $altdom. Later in the code the" [puppet] - 10https://gerrit.wikimedia.org/r/324408 (owner: 10Paladox) 
[20:36:03] <grrrit-wm>	 (03PS13) 10Paladox: Phabricator: Allow us to change the default web domain [puppet] - 10https://gerrit.wikimedia.org/r/324408 
[20:37:05] <grrrit-wm>	 (03PS14) 10Paladox: Phabricator: Allow us to change the default web domain [puppet] - 10https://gerrit.wikimedia.org/r/324408 
[20:38:23] <grrrit-wm>	 (03PS2) 10Filippo Giunchedi: logstash: switch to /srv partitioning for ingester hosts [puppet] - 10https://gerrit.wikimedia.org/r/324362 (https://phabricator.wikimedia.org/T150108) 
[20:38:27] <icinga-wm>	 RECOVERY - puppet last run on db1052 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures
[20:39:54] <grrrit-wm>	 (03CR) 10Filippo Giunchedi: [C: 032] logstash: switch to /srv partitioning for ingester hosts [puppet] - 10https://gerrit.wikimedia.org/r/324362 (https://phabricator.wikimedia.org/T150108) (owner: 10Filippo Giunchedi) 
[20:41:11] <grrrit-wm>	 (03CR) 10Dzahn: [C: 04-1] "- you are changing "git.wikimedia.org" to "git-ssh.wikimedia.org" it looks" [puppet] - 10https://gerrit.wikimedia.org/r/324408 (owner: 10Paladox) 
[20:41:45] <grrrit-wm>	 (03CR) 10Paladox: "Woops sorry, fixing it now." [puppet] - 10https://gerrit.wikimedia.org/r/324408 (owner: 10Paladox) 
[20:41:49] <grrrit-wm>	 (03CR) 10Dzahn: "let's call it "altdomain" like before, instead of "security domain"" [puppet] - 10https://gerrit.wikimedia.org/r/324408 (owner: 10Paladox) 
[20:42:19] <grrrit-wm>	 (03PS15) 10Paladox: Phabricator: Allow us to change the default web domain [puppet] - 10https://gerrit.wikimedia.org/r/324408 
[20:44:40] <grrrit-wm>	 (03PS16) 10Paladox: Phabricator: Allow us to change the default web domain [puppet] - 10https://gerrit.wikimedia.org/r/324408 
[20:44:47] <grrrit-wm>	 (03PS17) 10Paladox: Phabricator: Allow us to change the default web domain [puppet] - 10https://gerrit.wikimedia.org/r/324408 
[20:45:47] <icinga-wm>	 PROBLEM - puppet last run on cp3010 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[20:50:18] <wikibugs_>	 06Operations, 10Traffic, 06WMF-Communications, 07HTTPS, 07Security-Other: Server certificate is classified as invalid on government computers - https://phabricator.wikimedia.org/T128182#2836315 (10Florian)
[20:51:07] <icinga-wm>	 RECOVERY - puppet last run on ocg1001 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures
[20:51:20] <grrrit-wm>	 (03PS2) 10Filippo Giunchedi: graphite: cleanup labs instances metrics [puppet] - 10https://gerrit.wikimedia.org/r/323339 (https://phabricator.wikimedia.org/T143405) 
[20:52:18] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] graphite: cleanup labs instances metrics [puppet] - 10https://gerrit.wikimedia.org/r/323339 (https://phabricator.wikimedia.org/T143405) (owner: 10Filippo Giunchedi) 
[20:52:56] <logmsgbot>	 !log milimetric@tin Starting deploy [analytics/refinery@9cd8845]: (no message)
[20:53:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:54:59] <grrrit-wm>	 (03PS3) 10Filippo Giunchedi: graphite: cleanup labs instances metrics [puppet] - 10https://gerrit.wikimedia.org/r/323339 (https://phabricator.wikimedia.org/T143405) 
[20:55:54] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] graphite: cleanup labs instances metrics [puppet] - 10https://gerrit.wikimedia.org/r/323339 (https://phabricator.wikimedia.org/T143405) (owner: 10Filippo Giunchedi) 
[20:55:59] <logmsgbot>	 !log milimetric@tin Finished deploy [analytics/refinery@9cd8845]: (no message) (duration: 03m 02s)
[20:56:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:56:47] <grrrit-wm>	 (03PS18) 10Paladox: Phabricator: Allow us to change the default web domain [puppet] - 10https://gerrit.wikimedia.org/r/324408 
[20:57:09] <grrrit-wm>	 (03PS4) 10Filippo Giunchedi: graphite: cleanup labs instances metrics [puppet] - 10https://gerrit.wikimedia.org/r/323339 (https://phabricator.wikimedia.org/T143405) 
[20:57:50] <grrrit-wm>	 (03PS19) 10Paladox: Phabricator: Allow us to change the default web domain [puppet] - 10https://gerrit.wikimedia.org/r/324408 
[21:00:04] <paladox>	 ostriches why is the cpu at 92% https://ganglia.wikimedia.org/latest/?r=hour&cs=&ce=&m=cpu_report&c=Miscellaneous+eqiad&h=cobalt.wikimedia.org&tab=m&vn=&hide-hf=false&mc=2&z=medium&metric_group=NOGROUPS_%7C_network
[21:00:04] <jouncebot>	 gwicke, cscott, arlolra, subbu, bearND, mdholloway, halfak, Amir1, and yurik: Dear anthropoid, the time has come. Please deploy Services – Parsoid / OCG / Citoid / Mobileapps / ORES / … (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20161130T2100).
[21:00:05] <paladox>	 ?
[21:00:26] <halfak>	 Nothing for ORES
[21:00:29] <subbu>	 no parsoid deploy today
[21:01:27] <grrrit-wm>	 (03PS3) 10Andrew Bogott: Labs configuration for fi.wikivoyage.org [puppet] - 10https://gerrit.wikimedia.org/r/323698 (https://phabricator.wikimedia.org/T151570) (owner: 10MarcoAurelio) 
[21:02:43] <ostriches>	 paladox: l10nbot pushing to a billion repos, probably.
[21:02:45] * ostriches shrugs
[21:02:49] * ostriches goes back to lunch
[21:02:52] <paladox>	 Oh
[21:03:01] <paladox>	 ostriches but it keeps doing it every few hours?
[21:07:18] <ostriches>	 Oh well
[21:07:32] <wikibugs>	 06Operations, 10Gerrit, 13Patch-For-Review: Investigate why gerrit slowed down on 17/10/2016 / 18/10/2016 / 21/10/2016 - https://phabricator.wikimedia.org/T148478#2836410 (10Dzahn) today we disabled gc on gerrit completely  https://gerrit.wikimedia.org/r/#/c/323655/  this was linked to T151676 a related ticket
[21:07:35] <grrrit-wm>	 (03CR) 10Andrew Bogott: [C: 032] Labs configuration for fi.wikivoyage.org [puppet] - 10https://gerrit.wikimedia.org/r/323698 (https://phabricator.wikimedia.org/T151570) (owner: 10MarcoAurelio) 
[21:08:17] <wikibugs>	 06Operations, 10Gerrit, 07Beta-Cluster-reproducible, 13Patch-For-Review: gerrit jgit gc caused mediawiki/core repo problems - https://phabricator.wikimedia.org/T151676#2824332 (10Dzahn) now gc is disabled.  also see T148478
[21:09:14] <wikibugs>	 06Operations, 10Gerrit, 13Patch-For-Review: Investigate why gerrit slowed down on 17/10/2016 / 18/10/2016 / 21/10/2016  30/11/2016 - https://phabricator.wikimedia.org/T148478#2724179 (10Dzahn)
[21:09:18] <grrrit-wm>	 (03PS11) 10Andrew Bogott: base: Allow auto puppetmaster switching tuning [puppet] - 10https://gerrit.wikimedia.org/r/256890 (https://phabricator.wikimedia.org/T120159) (owner: 10Yuvipanda) 
[21:09:47] <icinga-wm>	 PROBLEM - puppet last run on ms-be3004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:10:18] <grrrit-wm>	 (03CR) 10Andrew Bogott: [C: 032] base: Allow auto puppetmaster switching tuning [puppet] - 10https://gerrit.wikimedia.org/r/256890 (https://phabricator.wikimedia.org/T120159) (owner: 10Yuvipanda) 
[21:12:48] <icinga-wm>	 PROBLEM - puppet last run on es2003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:12:57] <icinga-wm>	 PROBLEM - puppet last run on analytics1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:12:57] <icinga-wm>	 PROBLEM - puppet last run on mw1290 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:13:07] <icinga-wm>	 PROBLEM - puppet last run on cp1072 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/sbin/puppet-run]
[21:13:07] <icinga-wm>	 PROBLEM - puppet last run on dbstore1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:13:07] <icinga-wm>	 PROBLEM - puppet last run on mw1243 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/sbin/puppet-run]
[21:13:08] <icinga-wm>	 PROBLEM - puppet last run on mw2116 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:13:17] <icinga-wm>	 PROBLEM - puppet last run on druid1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:13:17] <icinga-wm>	 PROBLEM - puppet last run on mw1283 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/sbin/puppet-run]
[21:13:17] <icinga-wm>	 PROBLEM - puppet last run on mc1009 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:13:17] <icinga-wm>	 PROBLEM - puppet last run on mw2183 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:13:27] <icinga-wm>	 PROBLEM - puppet last run on db1088 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:13:27] <icinga-wm>	 PROBLEM - puppet last run on ms-be2002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:13:27] <icinga-wm>	 PROBLEM - puppet last run on mw2246 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/sbin/puppet-run]
[21:13:27] <icinga-wm>	 PROBLEM - puppet last run on mw2112 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:13:37] <icinga-wm>	 PROBLEM - puppet last run on rdb2002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:13:37] <icinga-wm>	 PROBLEM - puppet last run on mw2160 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:13:37] <icinga-wm>	 PROBLEM - puppet last run on mw2243 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:13:37] <icinga-wm>	 PROBLEM - puppet last run on mw2199 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:13:47] <icinga-wm>	 PROBLEM - puppet last run on elastic2022 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:13:47] <icinga-wm>	 PROBLEM - puppet last run on labtestweb2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:13:47] <icinga-wm>	 PROBLEM - puppet last run on kubernetes1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:13:47] <icinga-wm>	 PROBLEM - puppet last run on elastic1047 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:13:47] <icinga-wm>	 PROBLEM - puppet last run on logstash1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:13:47] <icinga-wm>	 PROBLEM - puppet last run on mc1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:13:47] <icinga-wm>	 PROBLEM - puppet last run on eventlog1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:13:48] <icinga-wm>	 PROBLEM - puppet last run on restbase1011 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:13:48] <icinga-wm>	 PROBLEM - puppet last run on mw2170 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:13:49] <icinga-wm>	 PROBLEM - puppet last run on labsdb1004 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/sbin/puppet-run]
[21:13:49] <icinga-wm>	 PROBLEM - puppet last run on db2046 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:13:50] <icinga-wm>	 PROBLEM - puppet last run on mw2201 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:14:07] <icinga-wm>	 PROBLEM - puppet last run on mw2153 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:14:07] <icinga-wm>	 PROBLEM - puppet last run on elastic1017 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:14:07] <icinga-wm>	 PROBLEM - puppet last run on db1044 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:14:07] <icinga-wm>	 PROBLEM - puppet last run on wtp1007 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:14:07] <icinga-wm>	 PROBLEM - puppet last run on mw1197 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:14:08] <icinga-wm>	 PROBLEM - puppet last run on mw1201 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:14:08] <icinga-wm>	 PROBLEM - puppet last run on mw1299 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:14:08] <icinga-wm>	 PROBLEM - puppet last run on db1064 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:14:08] <icinga-wm>	 PROBLEM - puppet last run on dbproxy1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:14:10] <icinga-wm>	 PROBLEM - puppet last run on cp2021 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:14:10] <icinga-wm>	 PROBLEM - puppet last run on db2069 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:14:10] <icinga-wm>	 PROBLEM - puppet last run on mw2139 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:14:17] <icinga-wm>	 PROBLEM - puppet last run on cp2025 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:14:17] <icinga-wm>	 PROBLEM - puppet last run on elastic2002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:14:17] <icinga-wm>	 PROBLEM - puppet last run on db2070 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:14:17] <icinga-wm>	 PROBLEM - puppet last run on mc2012 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:14:27] <icinga-wm>	 PROBLEM - puppet last run on labsdb1011 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:14:27] <icinga-wm>	 PROBLEM - puppet last run on ganeti1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:14:27] <icinga-wm>	 PROBLEM - puppet last run on mw1271 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:14:27] <icinga-wm>	 PROBLEM - puppet last run on dbproxy1010 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:14:27] <icinga-wm>	 PROBLEM - puppet last run on db1029 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:14:27] <icinga-wm>	 PROBLEM - puppet last run on rdb1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:14:27] <icinga-wm>	 PROBLEM - puppet last run on ms-be2003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:14:37] <icinga-wm>	 PROBLEM - puppet last run on db1031 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:14:37] <icinga-wm>	 PROBLEM - puppet last run on pybal-test2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:14:37] <icinga-wm>	 PROBLEM - puppet last run on restbase2012 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:14:47] <icinga-wm>	 PROBLEM - puppet last run on cp1049 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:14:47] <icinga-wm>	 PROBLEM - puppet last run on kafka1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:14:47] <icinga-wm>	 PROBLEM - puppet last run on elastic1034 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:14:47] <icinga-wm>	 PROBLEM - puppet last run on ms-fe2004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:14:47] <icinga-wm>	 PROBLEM - puppet last run on mw1281 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:14:47] <icinga-wm>	 PROBLEM - puppet last run on elastic2017 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:14:48] <icinga-wm>	 PROBLEM - puppet last run on mw2109 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:14:52] <mutante>	 .
[21:14:57] <icinga-wm>	 PROBLEM - puppet last run on analytics1043 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:14:57] <icinga-wm>	 PROBLEM - puppet last run on mc2009 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:14:57] <icinga-wm>	 PROBLEM - puppet last run on mw2220 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:14:57] <icinga-wm>	 PROBLEM - puppet last run on mw2105 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:14:57] <icinga-wm>	 PROBLEM - puppet last run on restbase-test2002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:15:07] <icinga-wm>	 PROBLEM - puppet last run on fluorine is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:15:07] <icinga-wm>	 PROBLEM - puppet last run on ms-be1006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:15:08] <icinga-wm>	 PROBLEM - puppet last run on labvirt1013 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:15:08] <icinga-wm>	 PROBLEM - puppet last run on mc1028 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:15:08] <icinga-wm>	 PROBLEM - puppet last run on mw2221 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:15:08] <icinga-wm>	 PROBLEM - puppet last run on ms-be2006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:15:08] <icinga-wm>	 PROBLEM - puppet last run on mw2188 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:15:17] <icinga-wm>	 PROBLEM - puppet last run on db2039 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:15:28] <mutante>	 andrewbogott: change to base puppet?
[21:15:28] <icinga-wm>	 PROBLEM - puppet last run on mw1254 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:15:28] <icinga-wm>	 PROBLEM - puppet last run on planet1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:15:28] <icinga-wm>	 PROBLEM - puppet last run on mw2240 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:15:28] <icinga-wm>	 PROBLEM - puppet last run on mw2083 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:15:37] <icinga-wm>	 PROBLEM - puppet last run on helium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:15:37] <icinga-wm>	 PROBLEM - puppet last run on mw1241 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:15:37] <icinga-wm>	 PROBLEM - puppet last run on lvs1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:15:37] <icinga-wm>	 PROBLEM - puppet last run on cp2023 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:15:37] <icinga-wm>	 PROBLEM - puppet last run on mw2244 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:15:37] <icinga-wm>	 PROBLEM - puppet last run on mw2229 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:15:38] <icinga-wm>	 PROBLEM - puppet last run on wtp2020 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:15:45] <andrewbogott>	 hm...
[21:15:47] <icinga-wm>	 PROBLEM - puppet last run on analytics1048 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:15:47] <icinga-wm>	 PROBLEM - puppet last run on relforge1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:15:47] <icinga-wm>	 PROBLEM - puppet last run on mw1205 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:15:47] <icinga-wm>	 PROBLEM - puppet last run on mw1263 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:15:47] <icinga-wm>	 PROBLEM - puppet last run on lvs1005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:15:47] <icinga-wm>	 PROBLEM - puppet last run on cp4002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:15:47] <andrewbogott>	 mutante: let me look
[21:15:47] <icinga-wm>	 PROBLEM - puppet last run on achernar is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:15:48] <icinga-wm>	 PROBLEM - puppet last run on ganeti2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:15:48] <icinga-wm>	 PROBLEM - puppet last run on mw2080 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:15:49] <icinga-wm>	 PROBLEM - puppet last run on mw1294 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:15:49] <icinga-wm>	 PROBLEM - puppet last run on cp3040 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:15:50] <icinga-wm>	 PROBLEM - puppet last run on lvs3003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:15:57] <icinga-wm>	 PROBLEM - puppet last run on db1066 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:15:57] <icinga-wm>	 PROBLEM - puppet last run on auth1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:15:57] <icinga-wm>	 PROBLEM - puppet last run on rdb2006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:15:57] <icinga-wm>	 PROBLEM - puppet last run on db2045 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:15:57] <icinga-wm>	 PROBLEM - puppet last run on mw2075 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:15:57] <icinga-wm>	 PROBLEM - puppet last run on wtp2005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:15:57] <icinga-wm>	 PROBLEM - puppet last run on cp3047 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:16:04] <mutante>	  suggests to type /ignore icinga-wm 
[21:16:07] <icinga-wm>	 PROBLEM - puppet last run on mw2247 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:16:07] <icinga-wm>	 PROBLEM - puppet last run on mw2117 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:16:07] <icinga-wm>	 PROBLEM - puppet last run on elastic1046 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:16:07] <icinga-wm>	 PROBLEM - puppet last run on ms-fe1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:16:07] <icinga-wm>	 PROBLEM - puppet last run on db1050 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:16:07] <icinga-wm>	 PROBLEM - puppet last run on mw1189 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:16:07] <icinga-wm>	 PROBLEM - puppet last run on mw1173 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:16:17] <icinga-wm>	 PROBLEM - puppet last run on db1022 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:16:17] <icinga-wm>	 PROBLEM - puppet last run on db1033 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:16:17] <icinga-wm>	 PROBLEM - puppet last run on ms-be2004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:16:17] <icinga-wm>	 PROBLEM - puppet last run on mw1300 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:16:17] <icinga-wm>	 PROBLEM - puppet last run on mw2163 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:16:17] <icinga-wm>	 PROBLEM - puppet last run on mw2114 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:16:27] <icinga-wm>	 PROBLEM - puppet last run on bast1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:16:27] <icinga-wm>	 PROBLEM - puppet last run on analytics1057 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:16:27] <icinga-wm>	 PROBLEM - puppet last run on mw1204 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:16:32] <mutante>	 just for the moment
[21:16:37] <icinga-wm>	 PROBLEM - puppet last run on copper is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:16:37] <icinga-wm>	 PROBLEM - puppet last run on mw1253 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:16:37] <icinga-wm>	 PROBLEM - puppet last run on ganeti1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:16:37] <icinga-wm>	 PROBLEM - puppet last run on analytics1035 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:16:37] <icinga-wm>	 PROBLEM - puppet last run on cp2016 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:16:37] <icinga-wm>	 PROBLEM - puppet last run on ganeti2004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:16:37] <icinga-wm>	 PROBLEM - puppet last run on mw2079 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:16:38] <icinga-wm>	 PROBLEM - puppet last run on maps2002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:16:47] <icinga-wm>	 PROBLEM - puppet last run on lvs2004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:16:47] <icinga-wm>	 PROBLEM - puppet last run on analytics1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:16:47] <icinga-wm>	 PROBLEM - puppet last run on elastic1018 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:16:48] <icinga-wm>	 PROBLEM - puppet last run on mw2212 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:16:48] <icinga-wm>	 PROBLEM - puppet last run on mw2143 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:16:48] <icinga-wm>	 PROBLEM - puppet last run on mw2082 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:16:48] <icinga-wm>	 PROBLEM - puppet last run on mw2134 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:17:00] <mutante>	 !log temp. stopping ircecho
[21:17:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:17:21] <grrrit-wm>	 (03PS1) 10Andrew Bogott: Revert "base: Allow auto puppetmaster switching tuning" [puppet] - 10https://gerrit.wikimedia.org/r/324568 
[21:18:12] <andrewbogott>	 sorry for the noise, I'm investigating
[21:18:34] <mutante>	 andrewbogott: thanks
[21:18:36] <mutante>	 there is this
[21:18:38] <mutante>	 Could not understand source #!/bin/bash
[21:19:19] <mutante>	 i'll have an eye on icinga and the bot
[21:20:22] <andrewbogott>	 that's the problem.  But...
[21:20:27] <andrewbogott>	 why is that not allowed in a .erb file?
[21:21:34] <grrrit-wm>	 (03CR) 10Andrew Bogott: [C: 032] Revert "base: Allow auto puppetmaster switching tuning" [puppet] - 10https://gerrit.wikimedia.org/r/324568 (owner: 10Andrew Bogott) 
[21:22:58] <godog>	 I'm off to lunch, brb
[21:23:09] <andrewbogott>	 puppet should be happy again for now
[21:23:21] <andrewbogott>	 I'm still confused about the erb parsing situation, but I'll make a test setup
[21:23:36] <mutante>	 thanks
[21:24:21] <mutante>	 yea, looks good, i'll get the bot back when the number of crits is down
[21:24:34] <mutante>	 the one i was one works 
[21:24:43] <wikibugs>	 06Operations, 10Domains, 10Traffic, 06WMF-Legal: Register nlwikipedia.org to prevent squatting - https://phabricator.wikimedia.org/T128968#2836472 (10CRoslof) 05Invalid>03Resolved Update: The Wikimedia Foundation has acquired nlwikipedia.org
[21:25:58] <mutante>	 tickets with domains and legal on it and i see them resolved on IRC, love it
[21:26:08] <sjoerddebruin>	 Yeah, great! :)
[21:26:09] <mutante>	 robh: 
[21:36:16] <mutante>	 !log phab/iridium: deleting tmp files older than 2 weeks
[21:36:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:45:10] <godog>	 andrewbogott: it should be content => for templates not source =>
[21:46:00] <andrewbogott>	 oh, maybe it's just that
[21:46:03] <andrewbogott>	 thanks
[21:46:11] <Krenair>	 jouncebot, refresh
[21:46:13] <Krenair>	 jouncebot, next
[21:46:14] <grrrit-wm>	 (03PS1) 10Dzahn: phab: add cron to clean up old tmp files [puppet] - 10https://gerrit.wikimedia.org/r/324601 (https://phabricator.wikimedia.org/T150396) 
[21:46:14] <jouncebot>	 I refreshed my knowledge about deployments.
[21:46:14] <jouncebot>	 In 0 hour(s) and 13 minute(s): Finnish Wikivoyage wiki creation (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20161130T2200)
[21:46:58] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] phab: add cron to clean up old tmp files [puppet] - 10https://gerrit.wikimedia.org/r/324601 (https://phabricator.wikimedia.org/T150396) (owner: 10Dzahn) 
[21:48:04] <grrrit-wm>	 (03PS2) 10Dzahn: phab: add cron to clean up old tmp files [puppet] - 10https://gerrit.wikimedia.org/r/324601 (https://phabricator.wikimedia.org/T150396) 
[21:48:36] <James_F>	 Krenair: Remember you'll need to do a parsoid service patch too.
[21:48:58] <Krenair>	 that happens afterwards
[21:48:59] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] phab: add cron to clean up old tmp files [puppet] - 10https://gerrit.wikimedia.org/r/324601 (https://phabricator.wikimedia.org/T150396) (owner: 10Dzahn) 
[21:49:00] <James_F>	 (Which can't be done until the new wiki is live, due to the way it pulls from the API or whatever.)
[21:49:00] <James_F>	 Yeh.
[21:49:04] <grrrit-wm>	 (03CR) 10Dzahn: "i feel like find is known working and easy enough (vs using tmpreaper)" [puppet] - 10https://gerrit.wikimedia.org/r/324601 (https://phabricator.wikimedia.org/T150396) (owner: 10Dzahn) 
[21:49:12] <James_F>	 But ideally as part of the same deploy window. :-)
[21:49:22] <godog>	 andrewbogott: yeah I think it is that judging from the error
[21:50:20] <godog>	 on the change itself I always cringe a little when we're mixing programming and erb templates :(
[21:50:57] <Krenair>	 James_F, ideally we'd actually be able to completely create a wiki
[21:51:17] <ostriches>	 Krenair: Create wikis simply? You must be new here...
[21:51:18] <James_F>	 Krenair: Well yeah, the parsoid config issue is not great.
[21:51:31] <Krenair>	 but since no one in ops knows all the steps and no one outside of ops has permissions for all the steps...
[21:52:02] <grrrit-wm>	 (03CR) 10Krinkle: phab: add cron to clean up old tmp files (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/324601 (https://phabricator.wikimedia.org/T150396) (owner: 10Dzahn) 
[21:52:04] <grrrit-wm>	 (03PS3) 10Dzahn: phab: add cron to clean up old tmp files [puppet] - 10https://gerrit.wikimedia.org/r/324601 (https://phabricator.wikimedia.org/T150396) 
[21:52:15] <Krenair>	 we end up relying on multiple people to get it done
[21:53:57] <mutante>	 to create a new wiki you need to start with "labs db replica"/tell dba people
[21:54:00] <mutante>	 then DNS
[21:54:06] <mutante>	 then mw
[21:54:37] <Krenair>	 sometimes you need to change apache
[21:54:44] <Krenair>	 sometimes you don't need to change DNS
[21:54:54] <mutante>	 ack
[21:55:16] <mutante>	 and then a bunch of changes in other tools and projects that hold lists of all wikis
[21:55:21] <Krenair>	 do you know exactly what the DBA will do?
[21:55:27] <mutante>	 i dont
[21:55:39] <mutante>	 but it's about having to prepare the replicas
[21:55:47] <mutante>	 before the wiki exists
[21:55:53] <mutante>	 or stuff is worse
[21:56:22] <Krenair>	 or, if the wiki is private, prevent replications
[21:56:25] <mutante>	 so it's supposed to be step 1 of the workflow
[21:56:25] <Krenair>	 replication*
[21:57:13] <mutante>	 there is also stuff like restbase config / services 
[21:57:29] <mutante>	 you really got multi teams involved sometimes
[21:57:53] <mutante>	 the good thing is, our wikitech page that has docs is way better than it used to be
[21:58:14] <mutante>	 improvements from the last couple wiki creations for sure
[21:58:23] <mutante>	 so it's getting faster i think
[21:59:32] <Krenair>	 the last wiki creation was pretty recent, so I am cautiously optimistic that addWiki.php might not die horribly in the middle of working
[22:00:04] <jouncebot>	 Krenair: Dear anthropoid, the time has come. Please deploy Finnish Wikivoyage wiki creation (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20161130T2200).
[22:02:37] <grrrit-wm>	 (03PS4) 10Dzahn: phab: add cron to clean up old tmp files [puppet] - 10https://gerrit.wikimedia.org/r/324601 (https://phabricator.wikimedia.org/T150396) 
[22:03:20] <Krenair>	 Dereckson, !
[22:03:28] <Krenair>	 it didn't die horribly!
[22:03:30] <grrrit-wm>	 (03CR) 10Dzahn: phab: add cron to clean up old tmp files (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/324601 (https://phabricator.wikimedia.org/T150396) (owner: 10Dzahn) 
[22:03:43] <Krenair>	 !log Ran mwscript extensions/WikimediaMaintenance/addWiki.php --wiki=aawiki fi wikivoyage fiwikivoyage fi.wikivoyage.org
[22:03:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:04:08] <Krenair>	 is it me or is stashbot slower than morebots?
[22:04:42] <Krenair>	 grumble grumble... merge conflict due to wikiversions.json
[22:06:55] <logmsgbot>	 !log bsitzmann@tin Starting deploy [mobileapps/deploy@d004bb4]: mobileapps deployment: 'Update service-mobileapp-node to 14deac7'
[22:07:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:07:25] <grrrit-wm>	 (03PS1) 10Andrew Bogott: base: Allow auto puppetmaster switching tuning [puppet] - 10https://gerrit.wikimedia.org/r/324610 (https://phabricator.wikimedia.org/T120159) 
[22:07:43] <grrrit-wm>	 (03PS2) 10Alex Monk: Initial configuration for fi.wikivoyage.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323695 (https://phabricator.wikimedia.org/T151570) (owner: 10MarcoAurelio) 
[22:07:44] <Dereckson>	 Krenair: yeah
[22:07:52] <mutante>	 !log re-enabling puppet on einsteinium, starting ircecho
[22:08:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:08:04] <logmsgbot>	 !log bsitzmann@tin Finished deploy [mobileapps/deploy@d004bb4]: mobileapps deployment: 'Update service-mobileapp-node to 14deac7' (duration: 01m 09s)
[22:08:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:08:23] <grrrit-wm>	 (03CR) 10Alex Monk: [C: 032] Initial configuration for fi.wikivoyage.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323695 (https://phabricator.wikimedia.org/T151570) (owner: 10MarcoAurelio) 
[22:08:28] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] base: Allow auto puppetmaster switching tuning [puppet] - 10https://gerrit.wikimedia.org/r/324610 (https://phabricator.wikimedia.org/T120159) (owner: 10Andrew Bogott) 
[22:08:43] <Dereckson>	 We've a blessed end of 2016, with the ability to create wikis without script issues. Stay tuned for 2017 breakages.
[22:08:57] <icinga-wm>	 RECOVERY - Debian mirror in sync with upstream on sodium is OK: /srv/mirrors/debian is over 0 hours old.
[22:09:04] <grrrit-wm>	 (03PS2) 10Andrew Bogott: base: Allow auto puppetmaster switching tuning [puppet] - 10https://gerrit.wikimedia.org/r/324610 (https://phabricator.wikimedia.org/T120159) 
[22:09:05] <Krenair>	 you had one a few weeks back though right?
[22:09:11] <Dereckson>	 yes
[22:09:34] <grrrit-wm>	 (03Merged) 10jenkins-bot: Initial configuration for fi.wikivoyage.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/323695 (https://phabricator.wikimedia.org/T151570) (owner: 10MarcoAurelio) 
[22:11:08] <logmsgbot>	 !log krenair@tin Synchronized dblists: https://gerrit.wikimedia.org/r/#/c/323695/ (duration: 00m 49s)
[22:11:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:11:40] <logmsgbot>	 !log krenair@tin rebuilt wikiversions.php and synchronized wikiversions files: https://gerrit.wikimedia.org/r/#/c/323695/
[22:11:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:12:05] <grrrit-wm>	 (03CR) 10Andrew Bogott: [C: 032] base: Allow auto puppetmaster switching tuning [puppet] - 10https://gerrit.wikimedia.org/r/324610 (https://phabricator.wikimedia.org/T120159) (owner: 10Andrew Bogott) 
[22:12:56] <logmsgbot>	 !log krenair@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/323695/ (duration: 00m 49s)
[22:13:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:13:20] <Krenair>	 !log Ran mwscript extensions/WikimediaMaintenance/filebackend/setZoneAccess.php fiwikivoyage --backend=local-multiwrite
[22:13:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:14:09] <Deskana>	 One of the proposals on the community wishlist is a regular backup of all the files on Commons.
[22:14:18] <Krenair>	 pff, still not uid 1
[22:14:18] <Deskana>	 Is this not something that's already done, for example via dumps?
[22:14:37] <Deskana>	 apergos: Maybe you know? I think you've worked on dumps, right? :-)
[22:14:45] <Krenair>	 I'd ask Ariel
[22:14:50] <Krenair>	 yeah
[22:14:57] <apergos>	 eh?
[22:15:15] <Deskana>	 This was the discussion: https://meta.wikimedia.org/wiki/2016_Community_Wishlist_Survey/Categories/Commons#Backup_of_Commons_files
[22:15:37] <icinga-wm>	 PROBLEM - puppet last run on wtp2020 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/sbin/puppet-run]
[22:15:37] <icinga-wm>	 PROBLEM - puppet last run on restbase2012 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/sbin/puppet-run]
[22:15:38] <apergos>	 of all images? I see
[22:15:47] <icinga-wm>	 PROBLEM - puppet last run on ms-fe2004 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/sbin/puppet-run]
[22:15:57] <icinga-wm>	 PROBLEM - puppet last run on mw2075 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/sbin/puppet-run]
[22:16:10] <Deskana>	 apergos: Yeah. I wasn't sure if this was something that was already done or not.
[22:16:27] <apergos>	 no, there used to be a live rsync mirror before images were moved to swift
[22:16:33] <apergos>	 but that's still not "backups"
[22:16:40] <Krenair>	 !log Ran the dumpInterwiki.php script but it just produced the existing data, so nothing to do there
[22:16:49] <Deskana>	 apergos: I see.  Thanks!
[22:16:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:16:56] <apergos>	 backups implies that we save copies around. that's a lot of storage. for us or for someone else
[22:16:58] <Krenair>	 (the wikidata sites population is running and is very unhappy)
[22:17:02] <apergos>	 yw
[22:19:07] <wikibugs_>	 06Operations, 10Cassandra, 10RESTBase, 06Services (doing): RESTBase k-r-v as Cassandra anti-pattern (or: revision retention policies considered harmful) - https://phabricator.wikimedia.org/T144431#2836606 (10GWicke) See T94121#2710479 for a summary of my earlier investigation of the wide row issue.
[22:19:52] <Krenair>	 James_F, https://gerrit.wikimedia.org/r/324614
[22:19:56] <mutante>	 apergos: btw, seems you were right about salt-minion runing twice, like it's an issue on trusty but not on jessie. also i can use debdeploy to restart services vs salt directly
[22:19:58] <Krenair>	 gwicke, mobrovac_: hey
[22:20:20] <apergos>	 ah gtk
[22:20:57] <mutante>	 that gets me the debdeploy server groups 
[22:21:01] <mutante>	 which are also salt grains
[22:21:33] <Krenair>	 gwicke, mobrovac_: ready for https://gerrit.wikimedia.org/r/#/c/323696/ if ops are?
[22:21:53] <Krenair>	 chasemp, able to run `maintain-views --databases fiwikivoyage --debug` on the labsdb hosts?
[22:22:45] <chasemp>	 Krenair: I'm tied up atm but in a bit or if this drags then first thing the a.m. probably?
[22:22:47] <icinga-wm>	 RECOVERY - puppet last run on ms-fe3001 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures
[22:23:01] <Krenair>	 chasemp, okay. I'll leave a note so it shows up in your inbox
[22:23:08] <chasemp>	 thanks
[22:25:47] <icinga-wm>	 RECOVERY - cassandra-c CQL 10.192.48.70:9042 on restbase2012 is OK: TCP OK - 0.037 second response time on 10.192.48.70 port 9042
[22:28:21] <gwicke>	 Krenair: go for it
[22:28:39] <Krenair>	 mutante, mind doing https://gerrit.wikimedia.org/r/#/c/323696/ ?
[22:29:05] <mutante>	 ah, yea, i can do that
[22:29:31] <grrrit-wm>	 (03PS4) 10Dzahn: RESTBase configuration for fi.wikivoyage.org [puppet] - 10https://gerrit.wikimedia.org/r/323696 (https://phabricator.wikimedia.org/T151570) (owner: 10MarcoAurelio) 
[22:29:37] <grrrit-wm>	 (03CR) 10Dzahn: [C: 032] RESTBase configuration for fi.wikivoyage.org [puppet] - 10https://gerrit.wikimedia.org/r/323696 (https://phabricator.wikimedia.org/T151570) (owner: 10MarcoAurelio) 
[22:29:57] <James_F>	 Krenair: Thanks!
[22:30:27] <grrrit-wm>	 (03CR) 10Dzahn: [V: 032] RESTBase configuration for fi.wikivoyage.org [puppet] - 10https://gerrit.wikimedia.org/r/323696 (https://phabricator.wikimedia.org/T151570) (owner: 10MarcoAurelio) 
[22:31:56] * Stryn waits for MF-Warburg to do the importing
[22:32:19] <grrrit-wm>	 (03PS1) 10ArielGlenn: make lock stale time for incremental dumps a lot shorter [puppet] - 10https://gerrit.wikimedia.org/r/324617 
[22:33:04] <mutante>	 duh, now i have 3 salt-minions on one server
[22:33:45] <Krenair>	 I'm no salt expert but I'm pretty sure that's not supposed to happen
[22:34:14] <grrrit-wm>	 (03PS2) 10ArielGlenn: make lock stale time for incremental dumps a lot shorter [puppet] - 10https://gerrit.wikimedia.org/r/324617 
[22:34:25] <mutante>	 it's when you manually restart it, and then puppet wants to chime in and restart it another time .. and if it's trusty and depending how you start it
[22:34:48] <Krenair>	 it behaves differently with upstart?
[22:35:06] <grrrit-wm>	 (03CR) 10ArielGlenn: [C: 032] make lock stale time for incremental dumps a lot shorter [puppet] - 10https://gerrit.wikimedia.org/r/324617 (owner: 10ArielGlenn) 
[22:36:00] <grrrit-wm>	 (03PS2) 10Jcrespo: Revert "mariadb: depool db1017 for maintenance and general upgrade" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/324548 
[22:36:35] <mutante>	 yea
[22:38:43] <Krenair>	 mutante, did the RB change apply everywhere?
[22:39:18] <grrrit-wm>	 (03CR) 10Jcrespo: [C: 032] Revert "mariadb: depool db1017 for maintenance and general upgrade" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/324548 (owner: 10Jcrespo) 
[22:39:48] <grrrit-wm>	 (03PS1) 10ArielGlenn: start adds/changes dumps cron job earlier [puppet] - 10https://gerrit.wikimedia.org/r/324618 
[22:39:55] <grrrit-wm>	 (03Merged) 10jenkins-bot: Revert "mariadb: depool db1017 for maintenance and general upgrade" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/324548 (owner: 10Jcrespo) 
[22:41:21] <mutante>	 Krenair: i did not force the puppet run 
[22:41:31] <grrrit-wm>	 (03CR) 10ArielGlenn: [C: 032] start adds/changes dumps cron job earlier [puppet] - 10https://gerrit.wikimedia.org/r/324618 (owner: 10ArielGlenn) 
[22:41:46] <Krenair>	 gwicke, we need puppet to run across all RB servers then you need to reload stuff, right?
[22:42:28] <logmsgbot>	 !log jynus@tin Synchronized wmf-config/db-eqiad.php: Repool es1017 (duration: 00m 45s)
[22:42:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:43:37] <icinga-wm>	 RECOVERY - puppet last run on wtp2020 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures
[22:43:38] <icinga-wm>	 RECOVERY - puppet last run on restbase2012 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures
[22:43:47] <icinga-wm>	 RECOVERY - puppet last run on ms-fe2004 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures
[22:43:57] <icinga-wm>	 RECOVERY - puppet last run on mw2075 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures
[22:47:19] <grrrit-wm>	 (03PS1) 10ArielGlenn: tweak stale lock refresh interval for incr dumps [dumps] - 10https://gerrit.wikimedia.org/r/324619 
[22:48:59] <Dereckson>	 Krenair: mutante: 3 salt-minion processes is the default config
[22:49:20] <grrrit-wm>	 (03PS2) 10ArielGlenn: tweak stale lock refresh interval for incr dumps [dumps] - 10https://gerrit.wikimedia.org/r/324619 
[22:49:53] <grrrit-wm>	 (03PS5) 10Krinkle: phab: add cron to clean up old tmp files [puppet] - 10https://gerrit.wikimedia.org/r/324601 (https://phabricator.wikimedia.org/T150396) (owner: 10Dzahn) 
[22:49:57] <grrrit-wm>	 (03CR) 10ArielGlenn: [C: 032] tweak stale lock refresh interval for incr dumps [dumps] - 10https://gerrit.wikimedia.org/r/324619 (owner: 10ArielGlenn) 
[22:49:59] <grrrit-wm>	 (03CR) 10Krinkle: "Oh, I see what you mean now :)" [puppet] - 10https://gerrit.wikimedia.org/r/324601 (https://phabricator.wikimedia.org/T150396) (owner: 10Dzahn) 
[22:52:59] <grrrit-wm>	 (03PS20) 10Paladox: Phabricator: Allow us to change the default web domain [puppet] - 10https://gerrit.wikimedia.org/r/324408 
[22:53:09] <grrrit-wm>	 (03PS3) 10Paladox: Phabricator: Allow us to change the default web domain in apache [puppet] - 10https://gerrit.wikimedia.org/r/324551 
[22:55:28] <Dereckson>	 or not, I checked a server of mine, and saw 3 too, but https://github.com/saltstack/salt/issues/7733 and https://github.com/saltstack/salt/issues/12217 seems to indicate issues
[22:55:43] <logmsgbot>	 !log ariel@tin Starting deploy [dumps/dumps@04a57c5]: (no message)
[22:55:45] <logmsgbot>	 !log ariel@tin Finished deploy [dumps/dumps@04a57c5]: (no message) (duration: 00m 01s)
[22:55:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:56:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:56:17] <ostriches>	 apergos: Whee !logging :D
[22:56:29] <apergos>	 indeed but I forgot to put a message in
[22:56:43] <apergos>	 and I was even happy to see that feature announced (saw it in mail today)
[22:59:32] <wikibugs_>	 06Operations, 10ChangeProp, 10EventBus, 10MediaWiki-JobQueue, and 4 others: Asynchronous processing in production: one queue to rule them all - https://phabricator.wikimedia.org/T149408#2836718 (10greg) p:05Triage>03Normal
[23:00:52] <grrrit-wm>	 (03PS1) 10ArielGlenn: fix usage message [dumps] - 10https://gerrit.wikimedia.org/r/324622 
[23:03:18] <wikibugs>	 06Operations, 10Gerrit, 13Patch-For-Review: Investigate why gerrit slowed down on 17/10/2016 / 18/10/2016 / 21/10/2016  30/11/2016 - https://phabricator.wikimedia.org/T148478#2836728 (10Paladox) The cpu seems to be still very high https://ganglia.wikimedia.org/latest/?r=hour&cs=&ce=&m=cpu_report&c=Miscellane...
[23:05:06] <Krenair>	 gwicke, ping
[23:05:43] <grrrit-wm>	 (03CR) 10Paladox: [C: 031] "@Dzahn can you run puppet compiler please?" [puppet] - 10https://gerrit.wikimedia.org/r/324408 (owner: 10Paladox) 
[23:05:49] <grrrit-wm>	 (03CR) 10Paladox: [C: 031] "@Dzahn can you run puppet compiler please?" [puppet] - 10https://gerrit.wikimedia.org/r/324551 (owner: 10Paladox) 
[23:08:56] <gwicke>	 Krenair: pong
[23:09:43] <Krenair>	 gwicke, how are the restarts going?
[23:09:56] <grrrit-wm>	 (03PS1) 10BryanDavis: toollabs: remove host aliases for tools-exec-12[01-11] [puppet] - 10https://gerrit.wikimedia.org/r/324623 (https://phabricator.wikimedia.org/T151980) 
[23:10:02] <gwicke>	 I'm not restarting anything
[23:10:25] <gwicke>	 typically ops or Marko would do that after merging the puppet change
[23:11:00] <Krenair>	 So when I asked if you were ready for the patch
[23:11:02] <Krenair>	 And you said yes
[23:11:05] <Krenair>	 What you meant was no
[23:17:18] <grrrit-wm>	 (03PS1) 10Alex Monk: Revert "RESTBase configuration for fi.wikivoyage.org" [puppet] - 10https://gerrit.wikimedia.org/r/324624 
[23:17:23] <Krenair>	 mutante: https://gerrit.wikimedia.org/r/324624
[23:19:40] <grrrit-wm>	 (03PS2) 10ArielGlenn: fix usage message [dumps] - 10https://gerrit.wikimedia.org/r/324622 
[23:21:37] <gwicke>	 Krenair: oh, I thought you were asking about whether there is any reason to not deploy this now
[23:21:48] <gwicke>	 sorry if  I was was being unclear 
[23:22:18] <Krenair>	 a reason not to deploy this now would be that we're not ready to make production state reflect the filesystem/puppet
[23:22:27] <gwicke>	 the process is documented at https://wikitech.wikimedia.org/wiki/RESTBase#Deploy_configuration_changes
[23:22:30] <jdlrobson>	 Hey... we noticed that https://en.m.wikipedia.org/wiki/Portal:Space is linking to action=purge urls
[23:22:34] <jdlrobson>	 im guessing it shouldnt do that?
[23:22:48] <jdlrobson>	 (has a link Purge server cache)
[23:22:53] <grrrit-wm>	 (03PS3) 10ArielGlenn: fix usage message [dumps] - 10https://gerrit.wikimedia.org/r/324622 
[23:22:57] <jdlrobson>	 no idea why.
[23:23:54] <Krenair>	 jdlrobson, it'll be one of the templates in there, they're able to make links like that
[23:24:12] <grrrit-wm>	 (03CR) 10ArielGlenn: [C: 032] fix usage message [dumps] - 10https://gerrit.wikimedia.org/r/324622 (owner: 10ArielGlenn) 
[23:24:32] <Krenair>	 jdlrobson, it's https://en.wikipedia.org/wiki/Template:Purge_page
[23:24:48] <grrrit-wm>	 (03PS1) 10ArielGlenn: miscdumps: use log.info for verbose only [dumps] - 10https://gerrit.wikimedia.org/r/324625 
[23:25:08] <jdlrobson>	 but should we be encouraging purging of pages by readers/crawlers? seems strange if you dont understand what that means
[23:25:12] <Krenair>	 apparently a lot of portals use it: https://en.wikipedia.org/w/index.php?title=Special:WhatLinksHere/Template:Purge_page&limit=500
[23:25:42] <Krenair>	 probably not
[23:26:29] <grrrit-wm>	 (03CR) 10ArielGlenn: [C: 032] miscdumps: use log.info for verbose only [dumps] - 10https://gerrit.wikimedia.org/r/324625 (owner: 10ArielGlenn) 
[23:27:50] <jdlrobson>	 mm
[23:27:54] <jdlrobson>	 thanks for the context Krenair 
[23:28:43] <Krenair>	 might be a good idea to find the revision which added that transclusion and find out why
[23:37:03] <grrrit-wm>	 (03CR) 10Filippo Giunchedi: [C: 04-1] "Trickier than it looks :(" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/324601 (https://phabricator.wikimedia.org/T150396) (owner: 10Dzahn) 
[23:44:09] <alchimista>	 hello Krenair, news from the recreation wiki (lol) ?
[23:44:25] <Krenair>	 alchimista, what?
[23:44:49] <alchimista>	 Krenair, https://phabricator.wikimedia.org/T126832 the wiki recreation, of ptwmp
[23:45:03] <Krenair>	 what about it?
[23:45:56] <alchimista>	 as far as i understand, there is no references, so the db can be "cleande"
[23:46:24] <alchimista>	 *cleaned. How's the work going? 
[23:48:30] <Krenair>	 I'm not aware of anyone working on it
[23:48:47] <Krenair>	 If someone was I would expect it to be clear on the task
[23:49:53] <mutante>	 Krenair: back, so we are reverting. 
[23:50:23] <grrrit-wm>	 (03PS2) 10Dzahn: Revert "RESTBase configuration for fi.wikivoyage.org" [puppet] - 10https://gerrit.wikimedia.org/r/324624 (owner: 10Alex Monk) 
[23:50:59] <grrrit-wm>	 (03CR) 10Dzahn: [C: 032] Revert "RESTBase configuration for fi.wikivoyage.org" [puppet] - 10https://gerrit.wikimedia.org/r/324624 (owner: 10Alex Monk) 
[23:51:20] <wikibugs>	 06Operations, 06Analytics-Kanban, 10hardware-requests: stat1001 replacement box in eqiad - https://phabricator.wikimedia.org/T149911#2836823 (10RobH) 05Open>03Resolved reinstalled, puppet and salt keys accepted.  it has some puppet failures, but since those are service related, i'll leave them to you to...
[23:57:09] <icinga-wm>	 PROBLEM - puppet last run on terbium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[23:57:41] <grrrit-wm>	 (03PS21) 10Dzahn: Phabricator: Allow us to change the default web domain [puppet] - 10https://gerrit.wikimedia.org/r/324408 (owner: 10Paladox)