[00:00:41] <Krenair>	 bd808, ToAruShiroiNeko: Depends on the type of wiki
[00:01:24] <Krenair>	 If you're after wikimania2014wiki, there's a ticket open for that - waiting for the local organisers to comment
[00:02:19] <ToAruShiroiNeko>	 I am bothered by all closed wikis
[00:02:22] <Krenair>	 There is https://meta.wikimedia.org/wiki/Closing_projects_policy for the standard wikis
[00:02:30] <ToAruShiroiNeko>	 I wish they remain semi-editable
[00:02:44] <Krenair>	 closed wikis are semi-editable
[00:02:49] <Krenair>	 very few people can edit them
[00:03:17] <ToAruShiroiNeko>	 SUL unification, username renames, deletion of copyrighted content etc.
[00:03:43] <Krenair>	 They still run global user renames
[00:03:50] <ToAruShiroiNeko>	 I am told username renames arent possible
[00:03:56] <Krenair>	 local user renames sure
[00:04:05] <ToAruShiroiNeko>	 they were locked before SUL
[00:06:30] <Krenair>	 ToAruShiroiNeko, and didn't ever have SUL finalisation?
[00:06:46] <ToAruShiroiNeko>	 unfortunately no
[00:07:55] <Krenair>	 which wikis are these?
[00:07:56] <Krenair>	 legoktm, ^
[00:08:12] <ToAruShiroiNeko>	 Krenair its a long list
[00:08:35] <legoktm>	 huh?
[00:08:41] <ToAruShiroiNeko>	 http://tools.wmflabs.org/meta/userpages/White+Cat
[00:08:41] <legoktm>	 ToAruShiroiNeko: what wikis?
[00:08:47] <ToAruShiroiNeko>	 any page that isnt a redirect
[00:09:58] <legoktm>	 https://simple.wikibooks.org/wiki/Special:Log/Maintenance_script looks fine to me
[00:13:33] <ToAruShiroiNeko>	 so how can I get the White Cat accounts merged to my current user?
[00:15:14] <Nemo_bis>	 lol still hoping
[00:16:17] <Krenair>	 I imagine you'll have to wait for the user merge tool?
[00:19:19] <ToAruShiroiNeko>	 I have been waiting for this for over four years :p
[00:19:28] <ToAruShiroiNeko>	 I can wait four more if need be
[00:19:31] <ToAruShiroiNeko>	 but not more :p
[00:21:41] <YuviPanda>	 nemo_bis it's in progress! legoktm is hard at work merging my accounts! :D
[00:32:21] <grrrit-wm>	 (03PS1) 10BBlack: VCL: remove fqdn comment line [puppet] - 10https://gerrit.wikimedia.org/r/228584 
[00:32:24] <grrrit-wm>	 (03PS1) 10BBlack: VCL: remove restrict_access from text/upload backends [puppet] - 10https://gerrit.wikimedia.org/r/228585 
[00:32:26] <grrrit-wm>	 (03PS1) 10BBlack: network::constants::all_networks(_lo)? via flatten() [puppet] - 10https://gerrit.wikimedia.org/r/228586 
[00:32:28] <grrrit-wm>	 (03PS1) 10BBlack: VCL: use network::constants::all_networks_lo for ssl_proxies [puppet] - 10https://gerrit.wikimedia.org/r/228587 
[00:32:30] <grrrit-wm>	 (03PS1) 10BBlack: VCL: remove unused probes "swift", "options" [puppet] - 10https://gerrit.wikimedia.org/r/228588 
[00:32:32] <grrrit-wm>	 (03PS1) 10BBlack: VCL: define vcl_config "layer" for parsoidcache [puppet] - 10https://gerrit.wikimedia.org/r/228589 
[00:32:34] <grrrit-wm>	 (03PS1) 10BBlack: vhtcpd: /etc/init/varnishhtcpd.conf is long-gone now [puppet] - 10https://gerrit.wikimedia.org/r/228590 
[00:32:36] <grrrit-wm>	 (03PS1) 10BBlack: varnish: get rid of some pre-systemd cruft [puppet] - 10https://gerrit.wikimedia.org/r/228591 
[01:22:17] <icinga-wm>	 PROBLEM - puppet last run on analytics1044 is CRITICAL Puppet last ran 6 hours ago
[01:55:26] <Jamesofur>	 really? the patch gets rejected because my registered email address has a capital locally to start rather then a lowercase on gerrit? /grumbles grumbles/
[01:57:05] <legoktm>	 Jamesofur: the username part of email addresses is case-sensitive :)
[01:57:29] * Jamesofur glares
[01:57:48] * Jamesofur also apparently has to unstage the commit to get it to realize he changed the email address
[01:57:59] <legoktm>	 Jamesofur: git commit --amend --reset-author
[01:58:08] <Jamesofur>	 :)
[01:59:37] <grrrit-wm>	 (03PS1) 10Jalexander: Replace ssh key for jamesur [puppet] - 10https://gerrit.wikimedia.org/r/228597 
[02:20:02] <logmsgbot>	 !log l10nupdate Synchronized php-1.26wmf16/cache/l10n: (no message) (duration: 06m 11s)
[02:20:15] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[02:23:09] <logmsgbot>	 !log @tin LocalisationUpdate completed (1.26wmf16) at 2015-08-02 02:23:09+00:00
[02:23:17] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[02:33:48] <icinga-wm>	 PROBLEM - LVS HTTP IPv6 on mobile-lb.eqiad.wikimedia.org_ipv6 is CRITICAL: Connection timed out
[02:34:16] <YuviPanda>	 hmm, site loads for me
[02:34:39] <YuviPanda>	 logged in and logged out
[02:34:43] <YuviPanda>	 what's going on
[02:35:48] <YuviPanda>	 oh
[02:35:50] <YuviPanda>	 ipv6
[02:35:53] <James_F>	 Yeah.
[02:36:33] <YuviPanda>	 that's been flapping now and then
[02:38:07] <icinga-wm>	 RECOVERY - LVS HTTP IPv6 on mobile-lb.eqiad.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 301 TLS Redirect - 497 bytes in 0.018 second response time
[02:54:57] <icinga-wm>	 PROBLEM - LVS HTTPS IPv6 on mobile-lb.eqiad.wikimedia.org_ipv6 is CRITICAL: Connection timed out
[02:56:57] <icinga-wm>	 RECOVERY - LVS HTTPS IPv6 on mobile-lb.eqiad.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 200 OK - 18558 bytes in 3.060 second response time
[03:36:17] <icinga-wm>	 PROBLEM - puppet last run on mw1050 is CRITICAL Puppet has 1 failures
[03:36:48] <icinga-wm>	 PROBLEM - puppet last run on mw1109 is CRITICAL Puppet has 1 failures
[03:37:16] <icinga-wm>	 PROBLEM - LVS HTTP IPv6 on mobile-lb.eqiad.wikimedia.org_ipv6 is CRITICAL: Connection timed out
[03:38:17] <robh>	 blahhhhh
[03:38:29] <robh>	 stop alerting ipv6 my new alert tone is horrible
[03:39:36] <andrewbogott>	 Pagerduty is doing a good job of training me to ignore it
[03:43:06] <icinga-wm>	 RECOVERY - LVS HTTP IPv6 on mobile-lb.eqiad.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 301 TLS Redirect - 497 bytes in 0.003 second response time
[03:46:12] <robh>	 andrewbogott: well, dont blame pagerduty
[03:46:15] <robh>	 its really our fault
[03:46:17] <robh>	 its our check ;D
[03:46:53] <robh>	 if you dont use the pagerduty app its less annoying but meh, its supposed to annoy us really.
[03:47:45] <andrewbogott>	 ALERT #86, #87 on ops-gmtminus.  Replay 154: Ack all, 156: Resolv all
[03:48:01] <andrewbogott>	 It’s hard for me to take cryptic texts like that seriously.  Does every page amount to ‘check your email’?
[03:48:07] <andrewbogott>	 And if so, maybe they should just say that :)
[03:48:11] <robh>	 nope, but im using the app
[03:48:23] <robh>	 i'll change over to the normal sms on monday and try to make tem more useful
[03:48:33] <andrewbogott>	 ok, I’ll give the app a try tomorrow.
[03:48:50] <andrewbogott>	 In the meantime… should I actually try to fix that ipv6 flap?  I have no idea where that is or what it means :)
[03:53:24] <robh>	 I think its just an oversensitive check
[03:53:32] <robh>	 but its been ongoing for month+
[03:53:53] <robh>	 and now everyone is getting them and being annoyed.... i imagine we'll discuss in ops meeting now ;D
[03:55:13] <andrewbogott>	 yeah, maybe that counts as the system working :)
[03:58:59] <chasemp>	 I cleared a few earlier but count catch it in the act, not sure what to do ATM but family stuff going on here so time is limited 
[03:59:13] <chasemp>	 Couldn't catch I mean
[04:00:32] <robh>	 yea when i hit the computer i happen to login to pagerduty dashboard and ack them for both 'zones'  i hate the znoes shit too
[04:00:39] <robh>	 so i'm tring out a competitor on monday
[04:00:54] <robh>	 similar featureseat but unlimited # of contacts in a single page event
[04:01:04] <robh>	 unlike pagerduty's 10 (which leads to odd shit in config)
[04:01:29] <robh>	 cuz now when we ack it
[04:01:34] <robh>	 it doesnt ack it for the gmt+ folks
[04:01:35] <robh>	 it sucks.
[04:01:52] <robh>	 so i click on all temas in app and ack for both when i do it
[04:01:57] <robh>	 but its annoying as hell.
[04:02:26] <icinga-wm>	 RECOVERY - puppet last run on mw1050 is OK Puppet is currently enabled, last run 36 seconds ago with 0 failures
[04:02:47] <icinga-wm>	 RECOVERY - puppet last run on mw1109 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[04:56:29] <logmsgbot>	 !log @tin ResourceLoader cache refresh completed at Sun Aug  2 04:56:29 UTC 2015 (duration 56m 28s)
[04:56:39] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[05:44:37] <icinga-wm>	 PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL Anomaly detected: 10 data above and 0 below the confidence bounds
[05:55:47] <icinga-wm>	 PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL 16.67% of data above the critical threshold [500.0]
[06:29:48] <icinga-wm>	 PROBLEM - puppet last run on mw1177 is CRITICAL puppet fail
[06:30:56] <icinga-wm>	 PROBLEM - puppet last run on db2044 is CRITICAL puppet fail
[06:30:57] <icinga-wm>	 PROBLEM - puppet last run on db1028 is CRITICAL puppet fail
[06:31:06] <icinga-wm>	 PROBLEM - puppet last run on mc2007 is CRITICAL Puppet has 1 failures
[06:31:17] <icinga-wm>	 PROBLEM - puppet last run on db1067 is CRITICAL puppet fail
[06:31:37] <icinga-wm>	 PROBLEM - puppet last run on mw1110 is CRITICAL Puppet has 2 failures
[06:31:48] <icinga-wm>	 PROBLEM - puppet last run on mw1158 is CRITICAL Puppet has 1 failures
[06:31:56] <icinga-wm>	 PROBLEM - puppet last run on mw2158 is CRITICAL Puppet has 1 failures
[06:32:57] <icinga-wm>	 PROBLEM - puppet last run on db1045 is CRITICAL Puppet has 1 failures
[06:32:58] <icinga-wm>	 PROBLEM - puppet last run on wtp2017 is CRITICAL Puppet has 1 failures
[06:32:58] <icinga-wm>	 PROBLEM - puppet last run on mw2045 is CRITICAL Puppet has 1 failures
[06:32:58] <icinga-wm>	 PROBLEM - puppet last run on mw2050 is CRITICAL Puppet has 1 failures
[06:33:07] <icinga-wm>	 PROBLEM - puppet last run on mw2016 is CRITICAL Puppet has 1 failures
[06:33:56] <icinga-wm>	 PROBLEM - puppet last run on mw2207 is CRITICAL Puppet has 1 failures
[06:33:57] <icinga-wm>	 PROBLEM - puppet last run on mw2018 is CRITICAL Puppet has 1 failures
[06:55:56] <icinga-wm>	 RECOVERY - puppet last run on mw1158 is OK Puppet is currently enabled, last run 13 seconds ago with 0 failures
[06:56:57] <icinga-wm>	 RECOVERY - puppet last run on db1045 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:56:58] <icinga-wm>	 RECOVERY - puppet last run on db1028 is OK Puppet is currently enabled, last run 25 seconds ago with 0 failures
[06:57:06] <icinga-wm>	 RECOVERY - puppet last run on db2044 is OK Puppet is currently enabled, last run 25 seconds ago with 0 failures
[06:57:07] <icinga-wm>	 RECOVERY - puppet last run on wtp2017 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:57:07] <icinga-wm>	 RECOVERY - puppet last run on mw2045 is OK Puppet is currently enabled, last run 13 seconds ago with 0 failures
[06:57:07] <icinga-wm>	 RECOVERY - puppet last run on mw2050 is OK Puppet is currently enabled, last run 18 seconds ago with 0 failures
[06:57:08] <icinga-wm>	 RECOVERY - puppet last run on mw2016 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:57:16] <icinga-wm>	 RECOVERY - puppet last run on mc2007 is OK Puppet is currently enabled, last run 55 seconds ago with 0 failures
[06:57:17] <icinga-wm>	 RECOVERY - puppet last run on db1067 is OK Puppet is currently enabled, last run 55 seconds ago with 0 failures
[06:57:37] <icinga-wm>	 RECOVERY - puppet last run on mw1110 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:57:57] <icinga-wm>	 RECOVERY - puppet last run on mw2207 is OK Puppet is currently enabled, last run 55 seconds ago with 0 failures
[06:57:57] <icinga-wm>	 RECOVERY - puppet last run on mw2158 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:57:57] <icinga-wm>	 RECOVERY - puppet last run on mw1177 is OK Puppet is currently enabled, last run 27 seconds ago with 0 failures
[06:58:06] <icinga-wm>	 RECOVERY - puppet last run on mw2018 is OK Puppet is currently enabled, last run 59 seconds ago with 0 failures
[06:58:17] <icinga-wm>	 RECOVERY - HTTP 5xx req/min on graphite1001 is OK Less than 1.00% above the threshold [250.0]
[07:19:57] <icinga-wm>	 PROBLEM - check if wikidata.org dispatch lag is higher than 2 minutes on wikidata is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1419 bytes in 0.306 second response time
[07:39:16] <icinga-wm>	 RECOVERY - HTTP error ratio anomaly detection on graphite1001 is OK No anomaly detected
[08:21:56] <icinga-wm>	 RECOVERY - check if wikidata.org dispatch lag is higher than 2 minutes on wikidata is OK: HTTP OK: HTTP/1.1 200 OK - 1413 bytes in 0.118 second response time
[08:40:29] <grrrit-wm>	 (03PS1) 10Legoktm: Set an explicit 'wgLanguageCode' entry for metawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/228618 (https://phabricator.wikimedia.org/T90612) 
[08:41:32] <grrrit-wm>	 (03CR) 10Legoktm: "Needed by I53aa995d385b09bae41b210664b45143d7789861" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/228618 (https://phabricator.wikimedia.org/T90612) (owner: 10Legoktm)
[10:10:27] <wikibugs>	 6operations, 6Commons, 10MediaWiki-File-management, 10MediaWiki-Tarball-Backports, and 7 others: InstantCommons broken by switch to HTTPS - https://phabricator.wikimedia.org/T102566#1501358 (10Tau) I tried several maintenance scripts (purgeList, checkImages, rebuildImages etc.) but none of them helped. It'...
[10:16:15] <wikibugs>	 6operations, 10ops-ulsfo: RIPE Atlas Anchor @ ulsfo is down - https://phabricator.wikimedia.org/T107691#1501361 (10faidon) 3NEW
[10:40:18] <icinga-wm>	 PROBLEM - puppet last run on cp4011 is CRITICAL puppet fail
[11:06:57] <icinga-wm>	 RECOVERY - puppet last run on cp4011 is OK Puppet is currently enabled, last run 23 seconds ago with 0 failures
[11:26:35] <grrrit-wm>	 (03PS1) 10Merlijn van Deen: [toollabs] add script to generate python package listings [puppet] - 10https://gerrit.wikimedia.org/r/228635 (https://phabricator.wikimedia.org/T101646) 
[11:27:36] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] [toollabs] add script to generate python package listings [puppet] - 10https://gerrit.wikimedia.org/r/228635 (https://phabricator.wikimedia.org/T101646) (owner: 10Merlijn van Deen)
[11:31:07] <grrrit-wm>	 (03PS2) 10Merlijn van Deen: [toollabs] add script to generate python package listings [puppet] - 10https://gerrit.wikimedia.org/r/228635 (https://phabricator.wikimedia.org/T101646) 
[11:31:55] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] [toollabs] add script to generate python package listings [puppet] - 10https://gerrit.wikimedia.org/r/228635 (https://phabricator.wikimedia.org/T101646) (owner: 10Merlijn van Deen)
[11:33:28] <grrrit-wm>	 (03PS3) 10Merlijn van Deen: [toollabs] add script to generate python package listings [puppet] - 10https://gerrit.wikimedia.org/r/228635 (https://phabricator.wikimedia.org/T101646) 
[11:34:09] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] [toollabs] add script to generate python package listings [puppet] - 10https://gerrit.wikimedia.org/r/228635 (https://phabricator.wikimedia.org/T101646) (owner: 10Merlijn van Deen)
[11:39:02] <grrrit-wm>	 (03PS4) 10Merlijn van Deen: [toollabs] add script to generate python package listings [puppet] - 10https://gerrit.wikimedia.org/r/228635 (https://phabricator.wikimedia.org/T101646) 
[11:39:48] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] [toollabs] add script to generate python package listings [puppet] - 10https://gerrit.wikimedia.org/r/228635 (https://phabricator.wikimedia.org/T101646) (owner: 10Merlijn van Deen)
[11:43:08] <grrrit-wm>	 (03PS5) 10Merlijn van Deen: [toollabs] add script to generate python package listings [puppet] - 10https://gerrit.wikimedia.org/r/228635 (https://phabricator.wikimedia.org/T101646) 
[11:43:49] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] [toollabs] add script to generate python package listings [puppet] - 10https://gerrit.wikimedia.org/r/228635 (https://phabricator.wikimedia.org/T101646) (owner: 10Merlijn van Deen)
[11:54:22] <Steinsplitter>	 bblack u there O_O?
[11:55:28] <icinga-wm>	 PROBLEM - Host text-lb.esams.wikimedia.org_ipv6 is DOWN: /bin/ping6 -n -U -w 15 -c 5 2620:0:862:ed1a::1
[11:55:30] <bblack>	 yeah somewhat
[11:55:35] <bblack>	 why?
[11:55:50] <sjoerddebruin>	 Things are a little bit slow, I think...
[11:56:02] <bblack>	 what do you mean?
[11:56:27] <icinga-wm>	 RECOVERY - Host text-lb.esams.wikimedia.org_ipv6 is UPING OK - Packet loss = 0%, RTA = 88.52 ms
[11:57:11] <Steinsplitter>	 yesturday i renamed together with hoo a account with 60000 (nothing happen), now i have again a account with 60000 edits to rename. It is okay for you if i rename it now or schould i wait for him? <--bblack
[11:57:14] <bblack>	 yeah something happened in the graphs, not sure what yet
[11:57:50] <bblack>	 Steinsplitter: I have no idea what that really means in technical terms, but my advice would be if you have to ask, don't do it over the weekend.
[11:58:01] <Steinsplitter>	 ok
[11:58:22] <paravoid>	 what's going on?
[11:58:29] <_joe_>	 bblack: hey
[11:58:36] <bblack>	 looks kinda like the synflood the other day, but at esams?
[11:58:37] <_joe_>	 what paravoid said
[11:58:49] <bblack>	 still staring at graphs, already mostly over I think
[11:59:12] <bblack>	 http://ganglia.wikimedia.org/latest/graph.php?r=hour&z=xlarge&c=LVS+loadbalancers+esams&m=cpu_report&s=by+name&mc=2&g=network_report
[12:00:58] <Steinsplitter>	 ( https://upload.wikimedia.org/wikipedia/commons/0/03/Server-kitty.jpg )
[12:05:27] <bblack>	 !log started pybal on lvs3001
[12:05:31] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[12:06:03] <grrrit-wm>	 (03PS6) 10Merlijn van Deen: [toollabs] add script to generate python package listings [puppet] - 10https://gerrit.wikimedia.org/r/228635 (https://phabricator.wikimedia.org/T101646) 
[12:06:46] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] [toollabs] add script to generate python package listings [puppet] - 10https://gerrit.wikimedia.org/r/228635 (https://phabricator.wikimedia.org/T101646) (owner: 10Merlijn van Deen)
[12:28:47] <icinga-wm>	 PROBLEM - Hadoop DataNode on analytics1043 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[12:29:17] <icinga-wm>	 PROBLEM - salt-minion processes on analytics1043 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[12:30:38] <icinga-wm>	 PROBLEM - dhclient process on analytics1043 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[12:30:38] <icinga-wm>	 PROBLEM - Hadoop NodeManager on analytics1043 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[12:37:05] <grrrit-wm>	 (03CR) 10Glaisher: "https://github.com/wikimedia/operations-mediawiki-config/blob/d2813e1b8ae7e9e35414a30b1cb68a56e4033f71/wmf-config/CommonSettings.php#L889 " [mediawiki-config] - 10https://gerrit.wikimedia.org/r/228618 (https://phabricator.wikimedia.org/T90612) (owner: 10Legoktm)
[12:38:36] <grrrit-wm>	 (03CR) 10Glaisher: "Also does wgConf->get() stuff not work for settings in CommonSettings?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/228618 (https://phabricator.wikimedia.org/T90612) (owner: 10Legoktm)
[12:55:45] <jynus>	 is anyone doing something on the hadoop/analytics hosts? I will restart a couple of them otherwise
[13:00:50] <wikibugs>	 6operations, 7Ipv6: Fix IPv6 autoconf issues once and for all, across the fleet. - https://phabricator.wikimedia.org/T102099#1501465 (10BBlack)
[13:06:28] <jynus>	 ^dmesg is full of kernel bugs, shutdown/ps/etc does not work, will powercycle
[13:10:50] <jynus>	 !log powercycling analytics1043: kernel issues
[13:10:55] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[13:12:57] <icinga-wm>	 PROBLEM - Host analytics1043 is DOWN: PING CRITICAL - Packet loss = 100%
[13:13:56] <icinga-wm>	 RECOVERY - Hadoop DataNode on analytics1043 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.hdfs.server.datanode.DataNode
[13:13:57] <icinga-wm>	 RECOVERY - Host analytics1043 is UPING OK - Packet loss = 0%, RTA = 0.32 ms
[13:14:27] <icinga-wm>	 RECOVERY - salt-minion processes on analytics1043 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[13:14:46] <jynus>	 maybe got fried due to the recent logs, but when ps gets locked, not a good signal
[13:15:36] <icinga-wm>	 RECOVERY - dhclient process on analytics1043 is OK: PROCS OK: 0 processes with command name dhclient
[13:15:36] <icinga-wm>	 RECOVERY - Hadoop NodeManager on analytics1043 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager
[13:26:25] <jynus>	 !log powercycling analytics1044: same kernel fatal issues as 1043
[13:26:30] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[13:28:56] <icinga-wm>	 PROBLEM - Host mw2027 is DOWN: PING CRITICAL - Packet loss = 100%
[13:29:36] <icinga-wm>	 PROBLEM - Host analytics1044 is DOWN: PING CRITICAL - Packet loss = 100%
[13:29:57] <icinga-wm>	 RECOVERY - Host mw2027 is UPING OK - Packet loss = 0%, RTA = 44.12 ms
[13:31:17] <icinga-wm>	 RECOVERY - Hadoop NodeManager on analytics1044 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager
[13:31:17] <icinga-wm>	 RECOVERY - Hadoop DataNode on analytics1044 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.hdfs.server.datanode.DataNode
[13:31:26] <icinga-wm>	 RECOVERY - Host analytics1044 is UPING OK - Packet loss = 0%, RTA = 1.30 ms
[13:32:27] <icinga-wm>	 RECOVERY - dhclient process on analytics1044 is OK: PROCS OK: 0 processes with command name dhclient
[13:32:27] <icinga-wm>	 RECOVERY - salt-minion processes on analytics1044 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[13:32:56] <icinga-wm>	 RECOVERY - puppet last run on analytics1044 is OK Puppet is currently enabled, last run 33 seconds ago with 0 failures
[13:39:06] <icinga-wm>	 PROBLEM - Outgoing network saturation on labstore1003 is CRITICAL 10.71% of data above the critical threshold [100000000.0]
[13:41:58] <wikibugs>	 6operations: kernel BUG at /build/buildd/linux-3.13.0/mm/memory.c:3756 for analytics1044 and analytics1043 - https://phabricator.wikimedia.org/T107698#1501534 (10jcrespo) 3NEW
[13:43:30] <jynus>	 ^reported it but it does not require immediate actionables
[14:24:46] <icinga-wm>	 RECOVERY - Outgoing network saturation on labstore1003 is OK Less than 10.00% above the threshold [75000000.0]
[14:48:09] <grrrit-wm>	 (03PS1) 10Faidon Liambotis: mail: bump MX's spamassassin max_children to 32 [puppet] - 10https://gerrit.wikimedia.org/r/228656 
[14:48:32] <grrrit-wm>	 (03CR) 10Faidon Liambotis: [C: 032] mail: bump MX's spamassassin max_children to 32 [puppet] - 10https://gerrit.wikimedia.org/r/228656 (owner: 10Faidon Liambotis)
[15:05:11] <grrrit-wm>	 (03CR) 10Nemo bis: "Thanks for taking care of spam. :)" [puppet] - 10https://gerrit.wikimedia.org/r/228656 (owner: 10Faidon Liambotis)
[15:46:52] <Glaisher>	 https://zu.wikipedia.org/static/images/project-logos/default.png https://zu.wikipedia.org/w/static/images/project-logos/default.png
[15:47:05] <Glaisher>	 can the cache be cleared from that?
[15:47:19] <Glaisher>	 looks like it's causing some users to be served foundation logo on wikipedias
[15:47:24] <Glaisher>	 bblack: ^
[15:49:27] <Glaisher>	 also, can you explain why some users are being served from /w/static while others from /static
[15:53:01] <wikibugs>	 6operations, 7Varnish: Figure out purging of static logos for updates - https://phabricator.wikimedia.org/T106620#1501604 (10Glaisher) <Glaisher> https://zu.wikipedia.org/static/images/project-logos/default.png https://zu.wikipedia.org/w/static/images/project-logos/default.png <Glaisher>can the cache be cleare...
[15:54:33] <wikibugs>	 6operations, 7Varnish: Figure out purging of static logos for updates - https://phabricator.wikimedia.org/T106620#1501606 (10Glaisher) p:5Triage>3High Changing to high because users shouldn't be seeing foundation logo on Wikipedias. Also why is it set to expire on 2016? That seems a bit lengthy.
[15:56:29] <icinga-wm>	 PROBLEM - puppet last run on mw2202 is CRITICAL Puppet has 1 failures
[16:09:15] <hoo>	 enwiki job queue is all enqueue jobs... yay :/
[16:22:47] <icinga-wm>	 RECOVERY - puppet last run on mw2202 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures
[17:13:45] <bblack>	 Glaisher: I don't have good explanations for anything off the top of my head
[17:14:01] <bblack>	 was there a change to the logo image and/or path?
[17:14:17] <bblack>	 (is the cache actually different from what MW serves?)
[17:14:57] <bblack>	 (is this an all-wikis problem, or just "zu"?)
[17:16:15] <wikibugs>	 6operations: Configure librenms to use LDAP for authentication - https://phabricator.wikimedia.org/T107702#1501635 (10ori) 3NEW
[17:17:05] <Glaisher>	 bblack: Looks like there was a change to the logo recently.
[17:17:31] <bblack>	 do you know what the change was? like a gerrit link or something?
[17:17:58] <Glaisher>	 https://github.com/wikimedia/operations-mediawiki-config/commit/05d2bd0a6edd4a224ce72305a14c0683232c1d7f
[17:19:02] <Glaisher>	 hmm.. so the wikipedia logo (the correct one) is actually the old one
[17:19:24] <Krenair>	 which wiki was this?
[17:19:36] <Glaisher>	 zu.wikipedia.org
[17:19:41] <Glaisher>	 See #wikipedia's scrollback
[17:20:18] <Krenair>	 wtf, why are they pointing to default.png?
[17:20:48] <ori>	 Krenair: they don't define anything more specific, I think
[17:21:18] <bblack>	 I've gotta run, I have a lunch appt to make
[17:21:20] <Krenair>	 ahh
[17:21:22] <bblack>	 I'll check back in later, though :)
[17:21:30] <ori>	 bblack: we can sort it out :) take care
[17:21:39] <Krenair>	 one moment
[17:21:42] <ori>	 Krenair: or possibly they do, but not the HD variant
[17:23:32] <Krenair>	 my fault Glaisher, fixing
[17:24:21] <grrrit-wm>	 (03PS1) 10Alex Monk: Default wikipedias to enwiki.png [mediawiki-config] - 10https://gerrit.wikimedia.org/r/228676 
[17:24:23] <Glaisher>	 but why is /w/static and /static different?
[17:24:31] <Krenair>	 one is cached and one isn't
[17:24:40] <Krenair>	 the cached one is out of date and we don't know how to fix it
[17:25:11] <grrrit-wm>	 (03CR) 10Glaisher: [C: 031] Default wikipedias to enwiki.png [mediawiki-config] - 10https://gerrit.wikimedia.org/r/228676 (owner: 10Alex Monk)
[17:25:12] <ori>	 wait, we're using uncached URLs for the project logo now?
[17:25:32] <grrrit-wm>	 (03CR) 10Ori.livneh: [C: 031] "Go ahead and deploy." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/228676 (owner: 10Alex Monk)
[17:25:54] <Krenair>	 no
[17:26:07] <ori>	 Krenair: actually, do you mind if I sync that?
[17:26:11] <Glaisher>	 we're still using /static
[17:26:12] <ori>	 I want to test a change to my deployment helper scripts
[17:26:31] <Krenair>	 sure
[17:26:39] <ori>	 ok. Glaisher, gimme 5 mins.
[17:27:31] <grrrit-wm>	 (03CR) 10Ori.livneh: [C: 032] Default wikipedias to enwiki.png [mediawiki-config] - 10https://gerrit.wikimedia.org/r/228676 (owner: 10Alex Monk)
[17:27:37] <grrrit-wm>	 (03Merged) 10jenkins-bot: Default wikipedias to enwiki.png [mediawiki-config] - 10https://gerrit.wikimedia.org/r/228676 (owner: 10Alex Monk)
[17:28:52] <grrrit-wm>	 (03CR) 10Alex Monk: "I think this was affecting about 23 wikis. Mentioned by Glaisher on T106620" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/228676 (owner: 10Alex Monk)
[17:29:44] <wikibugs>	 6operations, 7Varnish: Figure out purging of static logos for updates - https://phabricator.wikimedia.org/T106620#1501647 (10Krenair) That issue with people seeing the foundation logo was separate to the purpose of this bug, see https://gerrit.wikimedia.org/r/#/c/228676/
[17:31:24] <Glaisher>	 Krenair: we actually have a specific logo for zuwiki
[17:32:19] <Krenair>	 we do?
[17:32:21] <Glaisher>	 Can we have a list of wikis with the specific $lang$site.png but not in InitialiseSettings.php?
[17:32:38] <Krenair>	 oh, yeah
[17:32:39] <Glaisher>	 https://github.com/wikimedia/operations-mediawiki-config/blob/master/w/static/images/project-logos/zuwiki.png
[17:32:48] <Glaisher>	 but english :p
[17:32:51] <Krenair>	 Yes
[17:32:57] <Krenair>	 It's a copy of enwiki.png
[17:33:18] <Krenair>	 Would've been a copy of default.png when it was added. I don't know why we ended up with such a mess in that directory
[17:34:23] <Krenair>	 wikis without specifically configured logos just got the one they had inherited from their project (or the default, enwiki's one) downloaded
[17:35:01] <Krenair>	 even though it was unused
[17:35:37] <Glaisher>	 mhm..
[17:36:08] <Glaisher>	 why were some users being served the outdated cached one while others the new one?
[17:47:16] <grrrit-wm>	 (03CR) 10Alex Monk: "Returns null for me" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/228618 (https://phabricator.wikimedia.org/T90612) (owner: 10Legoktm)
[17:52:17] <logmsgbot>	 !log ori Synchronized wmf-config/InitialiseSettings.php: If7fcb6e6: Default wikipedias to enwiki.png (duration: 00m 12s)
[17:52:22] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[17:52:41] <ori>	 woooooo
[17:53:13] <ori>	 https://dpaste.de/LkKp/raw
[17:54:51] <ori>	 the script queried the API for recently-merged changes in wmf/* branches in mediawiki/core and in master of operations/mediawiki-config, checked which ones had not been merged locally, computed the most concise invocation of either sync-dir/sync-file, and generated the log message
[17:56:19] <ori>	 https://gist.github.com/atdt/f1befbf448100a339ee0
[17:57:28] <hoo>	 That sounds handy... and also very dangerous (given it doesn't ask for confirmation)
[17:57:33] <ori>	 heh
[17:57:37] <Krenair>	 yes, I wouldn't run that
[17:57:41] <ori>	 i need to add some safety features (prompting for confirmation, handling failures) and handling of submodules
[17:58:08] <ori>	 it's not done yet :P
[17:59:00] <Krenair>	 ori, how can we clear the varnish cache of static images which have changed?
[18:01:07] <ori>	 ideally, we don't; we wait for the change to propagate
[18:01:29] <hoo>	 yeah
[18:01:34] <hoo>	 and if not... poke Brandon
[18:01:46] <ori>	 the "holiday logo" use-case can be satisfied with MediaWiki:Common.css
[18:02:18] <ori>	 permanent updates to logos are sufficiently infrequent that they can require either waiting for cache propagation or asking someone in ops to ban the url
[18:02:19] <hoo>	 but it IMO shouldn't
[18:02:52] <hoo>	 (That's because I have general concerns with that kind of CSS)
[18:03:35] <ori>	 Krenair: in this case, if the wrong logo is cached (or possibly cached), ping b.black with the URL
[18:04:36] <Krenair>	 */static/images/project-logos/default.png ?
[18:04:48] <grrrit-wm>	 (03CR) 10Hoo man: "@Lokal Profil: Can you please bring your github repo up to date or do you want to do the primary development here now?" [puppet] - 10https://gerrit.wikimedia.org/r/219800 (https://phabricator.wikimedia.org/T103087) (owner: 10Lokal Profil)
[18:05:55] <ori>	 Krenair: how many wikis does '*' represent in this case?
[18:06:33] <Krenair>	 I don't think I actually should have needed default.png purged
[18:07:05] <Krenair>	 although clearly it did cause an issue on a few wikis due to a mistake
[18:07:24] <Krenair>	 but we should be able to purge a logo file everywhere
[18:34:37] <icinga-wm>	 PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL Anomaly detected: 10 data above and 0 below the confidence bounds
[19:00:37] <icinga-wm>	 PROBLEM - puppet last run on cp3048 is CRITICAL puppet fail
[19:26:47] <icinga-wm>	 RECOVERY - puppet last run on cp3048 is OK Puppet is currently enabled, last run 30 seconds ago with 0 failures
[19:27:45] <grrrit-wm>	 (03PS1) 10Tim Landscheidt: Ignore warnings about URLs without modules for volatile directory [puppet] - 10https://gerrit.wikimedia.org/r/228682 (https://phabricator.wikimedia.org/T87132) 
[19:28:50] <grrrit-wm>	 (03CR) 10Tim Landscheidt: "If my description of the volatile directory is wrong, please amend." [puppet] - 10https://gerrit.wikimedia.org/r/228682 (https://phabricator.wikimedia.org/T87132) (owner: 10Tim Landscheidt)
[19:41:03] <wikibugs>	 6operations, 7Varnish: Figure out purging of static logos for updates - https://phabricator.wikimedia.org/T106620#1501729 (10BBlack) >>! In T106620#1501606, @Glaisher wrote: > Changing to high because users shouldn't be seeing foundation logo on Wikipedias. Also why is it set to expire on 2016? That seems a bi...
[19:43:04] <bblack>	 some crazy things going on with reqerr rates the past 24h, two little lumps, one ongoing
[19:43:10] <bblack>	 they're not very big, just...  odd
[19:43:11] <bblack>	 https://gdash.wikimedia.org/dashboards/reqerror/
[19:52:29] <grrrit-wm>	 (03PS1) 10Tim Landscheidt: haproxy: Move check_haproxy to module itself [puppet] - 10https://gerrit.wikimedia.org/r/228712 (https://phabricator.wikimedia.org/T87132) 
[20:15:56] <icinga-wm>	 RECOVERY - HTTP error ratio anomaly detection on graphite1001 is OK No anomaly detected
[20:21:35] <grrrit-wm>	 (03PS7) 10Merlijn van Deen: [toollabs] add script to generate python package listings [puppet] - 10https://gerrit.wikimedia.org/r/228635 (https://phabricator.wikimedia.org/T101646) 
[20:22:18] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] [toollabs] add script to generate python package listings [puppet] - 10https://gerrit.wikimedia.org/r/228635 (https://phabricator.wikimedia.org/T101646) (owner: 10Merlijn van Deen)
[20:24:36] <grrrit-wm>	 (03PS8) 10Merlijn van Deen: [toollabs] add script to generate python package listings [puppet] - 10https://gerrit.wikimedia.org/r/228635 (https://phabricator.wikimedia.org/T101646) 
[20:31:57] <icinga-wm>	 PROBLEM - puppet last run on cp3036 is CRITICAL puppet fail
[20:58:17] <icinga-wm>	 RECOVERY - puppet last run on cp3036 is OK Puppet is currently enabled, last run 31 seconds ago with 0 failures
[20:59:37] <grrrit-wm>	 (03CR) 10Legoktm: "> Also does wgConf->get() stuff not work for settings in CommonSettings?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/228618 (https://phabricator.wikimedia.org/T90612) (owner: 10Legoktm)
[21:26:24] <grrrit-wm>	 (03CR) 10Alex Monk: "https://phabricator.wikimedia.org/P1812" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/228618 (https://phabricator.wikimedia.org/T90612) (owner: 10Legoktm)
[21:31:14] <grrrit-wm>	 (03PS2) 10Alex Monk: Fix part of the VE NS config issue [mediawiki-config] - 10https://gerrit.wikimedia.org/r/228198 (https://phabricator.wikimedia.org/T104898) 
[22:31:36] <icinga-wm>	 PROBLEM - check if wikidata.org dispatch lag is higher than 2 minutes on wikidata is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1426 bytes in 0.194 second response time
[22:43:27] <YuviPanda>	 hoo: you were working on the patch to rsync wikidata json dumps to labs, right?
[22:44:01] <hoo>	 Addshore was primarily, but I have been doing CR so yes
[22:44:42] <YuviPanda>	 hoo: link to the patch?
[22:44:45] <YuviPanda>	 (just curious)
[22:45:11] <hoo>	 https://gerrit.wikimedia.org/r/215585
[22:45:24] <hoo>	 I'm working on the above alert, btw
[23:02:38] <icinga-wm>	 RECOVERY - check if wikidata.org dispatch lag is higher than 2 minutes on wikidata is OK: HTTP OK: HTTP/1.1 200 OK - 1418 bytes in 0.184 second response time
[23:04:51] <YuviPanda>	 hoo: thanks
[23:06:10] <wikibugs>	 6operations, 6Commons, 10MediaWiki-File-management, 10MediaWiki-Tarball-Backports, and 7 others: InstantCommons broken by switch to HTTPS - https://phabricator.wikimedia.org/T102566#1502005 (10Tgr) You could try to cherry-pick https://gerrit.wikimedia.org/r/#/c/223518/ and set `$wgDebugLogGroups['http'] =...
[23:52:26] <icinga-wm>	 PROBLEM - puppet last run on mw1047 is CRITICAL Puppet has 1 failures
[23:59:14] <grrrit-wm>	 (03CR) 10Alex Monk: [C: 032] Fix part of the VE NS config issue [mediawiki-config] - 10https://gerrit.wikimedia.org/r/228198 (https://phabricator.wikimedia.org/T104898) (owner: 10Alex Monk)
[23:59:38] <grrrit-wm>	 (03Merged) 10jenkins-bot: Fix part of the VE NS config issue [mediawiki-config] - 10https://gerrit.wikimedia.org/r/228198 (https://phabricator.wikimedia.org/T104898) (owner: 10Alex Monk)