[00:14:39] <icinga-wm>	 PROBLEM - All k8s worker nodes are healthy on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/k8s/nodes/ready - 185 bytes in 0.599 second response time
[00:17:20] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1197 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[00:17:29] <icinga-wm>	 RECOVERY - All k8s worker nodes are healthy on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.151 second response time
[00:17:39] <icinga-wm>	 PROBLEM - HHVM rendering on mw1197 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[00:18:19] <icinga-wm>	 RECOVERY - Nginx local proxy to apache on mw1197 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 616 bytes in 8.574 second response time
[00:18:29] <icinga-wm>	 RECOVERY - HHVM rendering on mw1197 is OK: HTTP OK: HTTP/1.1 200 OK - 73062 bytes in 1.422 second response time
[00:43:09] <icinga-wm>	 PROBLEM - puppet last run on mw1210 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[00:52:39] <icinga-wm>	 PROBLEM - HHVM rendering on mw1197 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[00:54:03] <wikibugs_>	 06Operations, 10MediaWiki-JobQueue, 10Wikidata: Job queue rising to nearly 3 million jobs - https://phabricator.wikimedia.org/T159618#3073585 (10Legoktm) 00:47, 5 March 2017 Sjoerddebruin (talk | contribs) blocked Emijrpbot (talk | contribs) with an expiration time of indefinite (account creation disabled, a...
[00:54:29] <icinga-wm>	 RECOVERY - HHVM rendering on mw1197 is OK: HTTP OK: HTTP/1.1 200 OK - 73034 bytes in 1.759 second response time
[01:01:39] <icinga-wm>	 PROBLEM - Check systemd state on cobalt is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[01:02:29] <icinga-wm>	 RECOVERY - Check systemd state on cobalt is OK: OK - running: The system is fully operational
[01:03:29] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on cobalt is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[01:04:19] <icinga-wm>	 RECOVERY - Check whether ferm is active by checking the default input chain on cobalt is OK: OK ferm input default policy is set
[01:06:19] <icinga-wm>	 PROBLEM - Check systemd state on conf2002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[01:06:39] <icinga-wm>	 PROBLEM - etcdmirror--eqiad-wmnet service on conf2002 is CRITICAL: CRITICAL - Expecting active but unit etcdmirror--eqiad-wmnet is failed
[01:07:09] <icinga-wm>	 PROBLEM - dhclient process on cobalt is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[01:07:09] <icinga-wm>	 PROBLEM - configured eth on cobalt is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[01:07:10] <icinga-wm>	 PROBLEM - DPKG on cobalt is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[01:07:10] <icinga-wm>	 PROBLEM - Etcd replication lag on conf2002 is CRITICAL: connect to address 10.192.32.141 and port 8000: Connection refused
[01:07:19] <icinga-wm>	 PROBLEM - puppet last run on cobalt is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[01:07:29] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on cobalt is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[01:08:09] <icinga-wm>	 PROBLEM - gerrit process on cobalt is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[01:08:19] <icinga-wm>	 PROBLEM - salt-minion processes on cobalt is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[01:08:39] <icinga-wm>	 PROBLEM - MD RAID on cobalt is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[01:08:40] <icinga-wm>	 PROBLEM - Disk space on cobalt is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[01:08:59] <icinga-wm>	 RECOVERY - dhclient process on cobalt is OK: PROCS OK: 0 processes with command name dhclient
[01:08:59] <icinga-wm>	 RECOVERY - gerrit process on cobalt is OK: PROCS OK: 1 process with regex args ^GerritCodeReview .*-jar /var/lib/gerrit2/review_site/bin/gerrit.war
[01:08:59] <icinga-wm>	 RECOVERY - DPKG on cobalt is OK: All packages OK
[01:08:59] <icinga-wm>	 RECOVERY - configured eth on cobalt is OK: OK - interfaces up
[01:09:09] <icinga-wm>	 RECOVERY - salt-minion processes on cobalt is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[01:09:09] <icinga-wm>	 RECOVERY - puppet last run on cobalt is OK: OK: Puppet is currently enabled, last run 11 minutes ago with 0 failures
[01:09:19] <icinga-wm>	 RECOVERY - Check whether ferm is active by checking the default input chain on cobalt is OK: OK ferm input default policy is set
[01:09:49] <icinga-wm>	 PROBLEM - Check size of conntrack table on cobalt is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[01:10:09] <icinga-wm>	 RECOVERY - puppet last run on mw1210 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures
[01:10:29] <icinga-wm>	 RECOVERY - MD RAID on cobalt is OK: OK: Active: 8, Working: 8, Failed: 0, Spare: 0
[01:10:29] <icinga-wm>	 RECOVERY - Disk space on cobalt is OK: DISK OK
[01:10:39] <icinga-wm>	 RECOVERY - Check size of conntrack table on cobalt is OK: OK: nf_conntrack is 0 % full
[01:11:09] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on kubernetes1002 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly
[01:12:09] <icinga-wm>	 RECOVERY - Check whether ferm is active by checking the default input chain on kubernetes1002 is OK: OK ferm input default policy is set
[01:13:09] <icinga-wm>	 PROBLEM - puppet last run on es1013 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[01:41:09] <icinga-wm>	 RECOVERY - puppet last run on es1013 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures
[01:43:19] <icinga-wm>	 PROBLEM - Host mw2256 is DOWN: PING CRITICAL - Packet loss = 100%
[01:44:19] <icinga-wm>	 RECOVERY - Host mw2256 is UP: PING OK - Packet loss = 0%, RTA = 36.27 ms
[02:06:45] <wikibugs_>	 06Operations, 10MediaWiki-JobQueue, 10Wikidata: Job queue rising to nearly 3 million jobs - https://phabricator.wikimedia.org/T159618#3073330 (10Betacommand) Suggestion, when the job queue gets too high set the maxlag parameter to a higher value, most bots use that as a throttle.
[02:12:09] <icinga-wm>	 PROBLEM - puppet last run on elastic1036 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[02:18:42] <logmsgbot>	 !log l10nupdate@tin scap sync-l10n completed (1.29.0-wmf.14) (duration: 07m 07s)
[02:18:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:24:02] <logmsgbot>	 !log l10nupdate@tin ResourceLoader cache refresh completed at Sun Mar  5 02:24:02 UTC 2017 (duration 5m 20s)
[02:24:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:35:51] <wikibugs_>	 (03PS1) 10Madhuvishy: paws-internal: Add support for serving static files authenticated via ldap [puppet] - 10https://gerrit.wikimedia.org/r/341188
[02:37:15] <wikibugs_>	 (03CR) 10jerkins-bot: [V: 04-1] paws-internal: Add support for serving static files authenticated via ldap [puppet] - 10https://gerrit.wikimedia.org/r/341188 (owner: 10Madhuvishy)
[02:39:09] <icinga-wm>	 RECOVERY - puppet last run on elastic1036 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures
[02:40:06] <wikibugs_>	 (03PS2) 10Madhuvishy: paws-internal: Add support for serving static files authenticated via ldap [puppet] - 10https://gerrit.wikimedia.org/r/341188
[02:45:41] <wikibugs_>	 (03PS3) 10Madhuvishy: paws-internal: Add support for serving static files authenticated via ldap [puppet] - 10https://gerrit.wikimedia.org/r/341188
[02:49:31] <wikibugs_>	 (03CR) 10Madhuvishy: [C: 032] paws-internal: Add support for serving static files authenticated via ldap [puppet] - 10https://gerrit.wikimedia.org/r/341188 (owner: 10Madhuvishy)
[02:53:59] <icinga-wm>	 PROBLEM - Check systemd state on notebook1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[02:54:29] <icinga-wm>	 PROBLEM - puppet last run on notebook1001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Service[apache2]
[03:02:19] <icinga-wm>	 PROBLEM - puppet last run on notebook1002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Service[apache2]
[03:02:59] <icinga-wm>	 PROBLEM - Check systemd state on notebook1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[03:09:53] <wikibugs_>	 (03PS1) 10Madhuvishy: paws-internal: Add apache ldap auth puppet dependencies [puppet] - 10https://gerrit.wikimedia.org/r/341189
[03:11:15] <wikibugs_>	 (03CR) 10Madhuvishy: [C: 032] paws-internal: Add apache ldap auth puppet dependencies [puppet] - 10https://gerrit.wikimedia.org/r/341189 (owner: 10Madhuvishy)
[03:13:29] <icinga-wm>	 RECOVERY - puppet last run on notebook1001 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures
[03:13:59] <icinga-wm>	 RECOVERY - Check systemd state on notebook1001 is OK: OK - running: The system is fully operational
[03:13:59] <icinga-wm>	 RECOVERY - Check systemd state on notebook1002 is OK: OK - running: The system is fully operational
[03:14:19] <icinga-wm>	 RECOVERY - puppet last run on notebook1002 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures
[03:22:39] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 669.91 seconds
[03:22:39] <icinga-wm>	 PROBLEM - puppet last run on cp3046 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[03:28:39] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 64.95 seconds
[03:33:19] <icinga-wm>	 PROBLEM - puppet last run on analytics1039 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/share/GeoIP/GeoIP2-City.mmdb.gz]
[03:33:29] <icinga-wm>	 PROBLEM - puppet last run on eeden is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/share/GeoIP/GeoIP2-City.mmdb.gz]
[03:51:40] <icinga-wm>	 RECOVERY - puppet last run on cp3046 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures
[04:00:29] <icinga-wm>	 RECOVERY - puppet last run on eeden is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures
[04:01:19] <icinga-wm>	 RECOVERY - puppet last run on analytics1039 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures
[04:46:09] <icinga-wm>	 PROBLEM - puppet last run on elastic1018 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[05:02:19] <icinga-wm>	 PROBLEM - puppet last run on mw1216 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[05:14:09] <icinga-wm>	 RECOVERY - puppet last run on elastic1018 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures
[05:24:09] <icinga-wm>	 PROBLEM - puppet last run on mw1297 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[05:30:19] <icinga-wm>	 RECOVERY - puppet last run on mw1216 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures
[05:34:19] <icinga-wm>	 PROBLEM - puppet last run on restbase1018 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[05:53:09] <icinga-wm>	 RECOVERY - puppet last run on mw1297 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures
[05:55:29] <icinga-wm>	 PROBLEM - puppet last run on labtestcontrol2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[06:02:19] <icinga-wm>	 RECOVERY - puppet last run on restbase1018 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures
[06:02:47] <wikibugs_>	 06Operations, 10MediaWiki-JobQueue, 10Wikidata: Job queue rising to nearly 3 million jobs - https://phabricator.wikimedia.org/T159618#3073686 (10Legoktm) p:05Unbreak!>03High Going down slowly...
[06:08:10] <icinga-wm>	 PROBLEM - puppet last run on labvirt1006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[06:23:29] <icinga-wm>	 RECOVERY - puppet last run on labtestcontrol2001 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures
[06:32:09] <icinga-wm>	 PROBLEM - puppet last run on mw1192 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[06:36:09] <icinga-wm>	 RECOVERY - puppet last run on labvirt1006 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures
[06:54:39] <icinga-wm>	 PROBLEM - puppet last run on maerlant is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[07:01:09] <icinga-wm>	 RECOVERY - puppet last run on mw1192 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures
[07:11:09] <icinga-wm>	 PROBLEM - puppet last run on es1017 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[07:23:30] <icinga-wm>	 RECOVERY - puppet last run on maerlant is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures
[07:39:09] <icinga-wm>	 RECOVERY - puppet last run on es1017 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures
[07:41:17] <wikibugs_>	 (03PS1) 10Yuvipanda: tools: Allow readonly access to all namespace objects [puppet] - 10https://gerrit.wikimedia.org/r/341191
[07:41:43] <wikibugs_>	 (03PS2) 10Yuvipanda: tools: Allow readonly access to all namespace objects [puppet] - 10https://gerrit.wikimedia.org/r/341191
[07:53:33] <wikibugs_>	 06Operations, 10MediaWiki-JobQueue, 10Wikidata: Job queue rising to nearly 3 million jobs - https://phabricator.wikimedia.org/T159618#3073330 (10Emijrp) All my bots follow the maxlag policy, as defined by default in Pywikibot user-config.py.
[08:07:39] <icinga-wm>	 PROBLEM - puppet last run on logstash1006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[08:36:25] <wikibugs_>	 (03PS3) 10Yuvipanda: tools: Allow readonly access to all namespace objects [puppet] - 10https://gerrit.wikimedia.org/r/341191
[08:36:39] <icinga-wm>	 RECOVERY - puppet last run on logstash1006 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures
[09:50:31] <wikibugs_>	 (03PS1) 10MarcoAurelio: Add 'flow-create-board' to CommonSettings.php for global groups [mediawiki-config] - 10https://gerrit.wikimedia.org/r/341193
[09:54:29] <icinga-wm>	 PROBLEM - Hadoop DataNode on analytics1028 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.hdfs.server.datanode.DataNode
[09:57:09] <icinga-wm>	 PROBLEM - puppet last run on analytics1028 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Service[hadoop-hdfs-datanode]
[10:10:49] <elukey>	 this one is a disk issue --^
[10:17:11] <wikibugs_>	 06Operations, 10ops-eqiad, 10Analytics, 10Analytics-Cluster, 06DC-Ops: Analytics1028 hdfs daemon died because of disk errors - https://phabricator.wikimedia.org/T159632#3073705 (10elukey)
[10:19:06] <elukey>	 !log disabled puppet on analytics1028 to avoid puppet to start the HDFS daemon (T159632)
[10:19:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:19:12] <stashbot>	 T159632: Analytics1028 hdfs daemon died because of disk errors - https://phabricator.wikimedia.org/T159632
[10:20:23] <elukey>	 this host is a bit important since it is one of the three Hadoop HDFS journal nodes, but the HDFS daemon seems the only one impacted
[10:23:36] <elukey>	 so I stopped Yarn nodemanager too, scheduled downtime and left the journalnode daemon up and running (since it seems working fine)
[10:24:09] <elukey>	 not sure if the disk will be swapped soon next week, so tomorrow I'll move the journalnode to analytics1029 probably
[10:24:14] <elukey>	 but for the moment everything seems fine
[10:24:28] * elukey sending an email to analytics
[10:25:10] <icinga-wm>	 PROBLEM - check_puppetrun on lutetium is CRITICAL: CRITICAL: puppet fail
[10:25:11] <wikibugs_>	 06Operations, 10ops-eqiad, 10Analytics, 10Analytics-Cluster, 06DC-Ops: Analytics1028 hdfs daemon died because of disk errors - https://phabricator.wikimedia.org/T159632#3073732 (10elukey) I also stopped the Yarn node manager but not the journalnode, will probably move it to analytics1029 tomorrow.
[10:30:09] <icinga-wm>	 PROBLEM - check_puppetrun on lutetium is CRITICAL: CRITICAL: puppet fail
[10:34:47] <elukey>	 all right done, everything seems good
[10:35:09] <icinga-wm>	 PROBLEM - check_puppetrun on lutetium is CRITICAL: CRITICAL: puppet fail
[10:40:09] <icinga-wm>	 PROBLEM - check_puppetrun on lutetium is CRITICAL: CRITICAL: puppet fail
[10:45:09] <icinga-wm>	 PROBLEM - check_puppetrun on lutetium is CRITICAL: CRITICAL: puppet fail
[10:50:09] <icinga-wm>	 PROBLEM - check_puppetrun on lutetium is CRITICAL: CRITICAL: puppet fail
[10:55:09] <icinga-wm>	 RECOVERY - check_puppetrun on lutetium is OK: OK: Puppet is currently enabled, last run 268 seconds ago with 0 failures
[11:20:22] <wikibugs_>	 (03PS1) 10Mbch331: WIP: Remove exception on Other Projects sidebar for Dutch Wikipedia Bug: T159634 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/341195 (https://phabricator.wikimedia.org/T159634)
[12:03:39] <wikibugs_>	 (03PS2) 10Urbanecm: WIP: Remove exception on Other Projects sidebar for Dutch Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/341195 (https://phabricator.wikimedia.org/T159634) (owner: 10Mbch331)
[12:05:29] <wikibugs_>	 (03CR) 10Urbanecm: [C: 04-1] "Without consensus, clarifying in the task." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/341195 (https://phabricator.wikimedia.org/T159634) (owner: 10Mbch331)
[12:26:55] <wikibugs_>	 (03PS3) 10Mbch331: WIP: Remove exception on Other Projects sidebar for Dutch Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/341195 (https://phabricator.wikimedia.org/T159634)
[12:29:19] <icinga-wm>	 PROBLEM - puppet last run on mc1022 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[12:30:09] <icinga-wm>	 PROBLEM - check_puppetrun on lutetium is CRITICAL: CRITICAL: puppet fail
[12:35:09] <icinga-wm>	 PROBLEM - check_puppetrun on lutetium is CRITICAL: CRITICAL: puppet fail
[12:40:09] <icinga-wm>	 PROBLEM - check_puppetrun on lutetium is CRITICAL: CRITICAL: puppet fail
[12:45:09] <icinga-wm>	 PROBLEM - check_puppetrun on lutetium is CRITICAL: CRITICAL: puppet fail
[12:50:09] <icinga-wm>	 RECOVERY - check_puppetrun on lutetium is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures
[12:57:19] <icinga-wm>	 RECOVERY - puppet last run on mc1022 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures
[13:02:19] <icinga-wm>	 PROBLEM - puppet last run on analytics1039 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[13:12:09] <icinga-wm>	 PROBLEM - puppet last run on restbase1017 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[13:16:08] <wikibugs_>	 06Operations, 10Wikimedia-General-or-Unknown: GenerateFancyCaptchas cronjob should output to logfile - https://phabricator.wikimedia.org/T159610#3073831 (10Florian) a:03Florian
[13:19:40] <wikibugs_>	 06Operations, 07Puppet: GenerateFancyCaptchas cronjob should output to logfile - https://phabricator.wikimedia.org/T159610#3073832 (10Florian)
[13:31:19] <icinga-wm>	 RECOVERY - puppet last run on analytics1039 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures
[13:32:52] <wikibugs_>	 (03PS1) 10Florianschmidtwelzow: Save logs of generate CAPTCHA cron to /var/log/mediawiki [puppet] - 10https://gerrit.wikimedia.org/r/341197 (https://phabricator.wikimedia.org/T159610)
[13:41:09] <icinga-wm>	 RECOVERY - puppet last run on restbase1017 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures
[13:57:39] <icinga-wm>	 PROBLEM - puppet last run on cp3007 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[14:22:40] <icinga-wm>	 PROBLEM - salt-minion processes on thumbor1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[14:23:29] <icinga-wm>	 RECOVERY - salt-minion processes on thumbor1002 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[14:26:39] <icinga-wm>	 RECOVERY - puppet last run on cp3007 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures
[14:49:19] <icinga-wm>	 RECOVERY - Check systemd state on conf2002 is OK: OK - running: The system is fully operational
[14:52:21] <icinga-wm>	 PROBLEM - Check systemd state on conf2002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[15:07:39] <icinga-wm>	 PROBLEM - puppet last run on prometheus1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[15:36:39] <icinga-wm>	 RECOVERY - puppet last run on prometheus1002 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures
[15:48:09] <icinga-wm>	 PROBLEM - puppet last run on mw1298 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[16:13:09] <icinga-wm>	 PROBLEM - puppet last run on es1013 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[16:17:09] <icinga-wm>	 RECOVERY - puppet last run on mw1298 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures
[16:40:09] <icinga-wm>	 RECOVERY - puppet last run on es1013 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures
[17:23:09] <icinga-wm>	 PROBLEM - puppet last run on wdqs1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[17:34:36] <wikibugs_>	 06Operations, 10MediaWiki-JobQueue, 10Wikidata: Job queue rising to nearly 3 million jobs - https://phabricator.wikimedia.org/T159618#3073982 (10Legoktm) >>! In T159618#3073691, @Emijrp wrote: > All my bots follow the maxlag policy, as defined by default in Pywikibot user-config.py.  Can you add a ratelimit...
[17:45:59] <icinga-wm>	 PROBLEM - Ensure mysql credential creation for tools users is running on labstore1005 is CRITICAL: CRITICAL - Expecting active but unit maintain-dbusers is failed
[17:46:39] <icinga-wm>	 PROBLEM - Check systemd state on labstore1005 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[17:47:29] <icinga-wm>	 PROBLEM - Redis replication status tcp_6479 on rdb2006 is CRITICAL: CRITICAL: replication_delay is 648 600 - REDIS 2.8.17 on 10.192.48.44:6479 has 1 databases (db0) with 4223399 keys, up 125 days 9 hours - replication_delay is 648
[17:47:29] <icinga-wm>	 PROBLEM - Redis replication status tcp_6479 on rdb2005 is CRITICAL: CRITICAL: replication_delay is 650 600 - REDIS 2.8.17 on 10.192.32.133:6479 has 1 databases (db0) with 4223445 keys, up 125 days 9 hours - replication_delay is 650
[17:51:09] <icinga-wm>	 RECOVERY - puppet last run on wdqs1001 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures
[18:00:09] <icinga-wm>	 PROBLEM - check_puppetrun on lutetium is CRITICAL: CRITICAL: puppet fail
[18:05:09] <icinga-wm>	 PROBLEM - check_puppetrun on lutetium is CRITICAL: CRITICAL: puppet fail
[18:09:39] <icinga-wm>	 RECOVERY - Check systemd state on labstore1005 is OK: OK - running: The system is fully operational
[18:09:59] <icinga-wm>	 RECOVERY - Ensure mysql credential creation for tools users is running on labstore1005 is OK: OK - maintain-dbusers is active
[18:10:09] <icinga-wm>	 RECOVERY - check_puppetrun on lutetium is OK: OK: Puppet is currently enabled, last run 213 seconds ago with 0 failures
[18:10:29] <icinga-wm>	 PROBLEM - puppet last run on cp3033 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[18:15:09] <icinga-wm>	 PROBLEM - puppet last run on db1044 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[18:38:29] <icinga-wm>	 RECOVERY - puppet last run on cp3033 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures
[18:43:09] <icinga-wm>	 RECOVERY - puppet last run on db1044 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures
[18:47:03] <matanya>	 Dereckson: are you around ?
[18:55:33] <matanya>	 anyone here that wants/can do server side upload ?
[19:11:23] <Dereckson>	 hi matanya, how can I help you?
[19:24:35] <matanya>	 Dereckson: hi, i'd like to do a server side upload, would a ticket be the right place ?
[19:26:25] <bawolff>	 Generally speaking a ticket is the right place afaik
[19:27:43] <bawolff>	 I'd note, that in theory the size limit for server side upload is the same as for normal uploads now a days
[19:28:25] <matanya>	 but you can't uploaded stuff automatically if there are gazillion files
[19:31:38] <bawolff>	 That is true
[19:32:43] <Dereckson>	 matanya: feel free to create a task and put me as subscriber, I'll handle it
[19:33:12] <matanya>	 Dereckson: https://phabricator.wikimedia.org/T159650
[19:34:22] <Dereckson>	 matanya: how can I transfer from encoding01.eqiad.wmflabs to Terbium?
[19:34:40] <matanya>	 Dereckson: scp would do, i guess
[19:38:29] <Dereckson>	 matanya: ask zhuyifei1999_ there are some tricks to share it on the web
[19:40:36] <matanya>	 Dereckson: will it help ? i think you should have the right to pull them, and if not, i can handle the sharing, i guess
[19:40:44] <zhuyifei1999_>	 matanya: one way would be reuse the urls that serves v2c files
[19:41:03] <matanya>	 zhuyifei1999_: it is in /srv/matanya on encoding01
[19:41:18] <matanya>	 what should be done to re-use it ?
[19:41:59] <zhuyifei1999_>	 copy to /srv/v2c/ssu?
[19:42:21] <zhuyifei1999_>	 or you can get nginx to point to /srv/matanya
[19:42:30] <zhuyifei1999_>	 add a proxy
[19:43:19] * zhuyifei1999_ gtg
[19:43:28] <matanya>	 thanks zhuyifei1999_
[19:43:37] <zhuyifei1999_>	 np
[19:44:22] * zhuyifei1999_ gtg
[19:48:54] <matanya>	 Dereckson: shared
[19:58:24] <wikibugs_>	 (03PS3) 10Urbanecm: Bs.wiktionary namespace changes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/341035 (https://phabricator.wikimedia.org/T159538)
[20:30:27] <wikibugs_>	 (03PS1) 10Brian Wolff: Extend the upload Content-Security-Policy test to other large wikis [puppet] - 10https://gerrit.wikimedia.org/r/341207 (https://phabricator.wikimedia.org/T117618)
[20:39:05] <matanya>	 Dereckson: i updated the ticket with the info on the web
[20:40:43] <Dereckson>	 ok
[20:47:29] <icinga-wm>	 RECOVERY - Redis replication status tcp_6479 on rdb2005 is OK: OK: REDIS 2.8.17 on 10.192.32.133:6479 has 1 databases (db0) with 4195505 keys, up 125 days 12 hours - replication_delay is 2
[20:49:19] <icinga-wm>	 RECOVERY - Check systemd state on conf2002 is OK: OK - running: The system is fully operational
[20:52:19] <icinga-wm>	 PROBLEM - Check systemd state on conf2002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[20:57:29] <icinga-wm>	 PROBLEM - Redis replication status tcp_6479 on rdb2005 is CRITICAL: CRITICAL: replication_delay is 602 600 - REDIS 2.8.17 on 10.192.32.133:6479 has 1 databases (db0) with 4195505 keys, up 125 days 12 hours - replication_delay is 602
[20:59:09] <icinga-wm>	 PROBLEM - puppet last run on wtp1005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:27:09] <icinga-wm>	 RECOVERY - puppet last run on wtp1005 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures
[22:17:51] <wikibugs_>	 (03PS4) 10Paladox: Gerrit: Add some apache rewrite rules for polygerrit [puppet] - 10https://gerrit.wikimedia.org/r/340900 (https://phabricator.wikimedia.org/T156120)
[22:17:59] <wikibugs_>	 (03PS1) 10Brian Wolff: Add a CSP policy to foundationwiki to prevent privacy breach [mediawiki-config] - 10https://gerrit.wikimedia.org/r/341259 (https://phabricator.wikimedia.org/T159386)
[22:18:01] <wikibugs_>	 (03PS5) 10Paladox: Gerrit: Add some apache rewrite rules for polygerrit [puppet] - 10https://gerrit.wikimedia.org/r/340900 (https://phabricator.wikimedia.org/T156120)
[22:18:09] <Dereckson>	 matanya: ping?
[22:18:38] <matanya>	 yes Dereckson ?
[22:18:44] <Dereckson>	 matanya: https://commons.wikimedia.org/wiki/File:MAZEN_DAOUD.jpg
[22:19:09] <Dereckson>	 apparenrly https://tools.wmflabs.org/video2commons/static/ssu/ABBAS_ASSI.jpg.txt is 404
[22:19:12] <matanya>	 i am fixing it
[22:19:52] <Dereckson>	 What syntax did you use for the description filenames?
[22:20:35] <matanya>	 what do you mean ?
[22:20:55] <wikibugs_>	 (03PS6) 10Paladox: Gerrit: Add some apache rewrite rules for polygerrit [puppet] - 10https://gerrit.wikimedia.org/r/340900 (https://phabricator.wikimedia.org/T156120)
[22:21:15] <Dereckson>	 The upload script expects the original filename + a suffix e.g. ABBAS_ASSI.jpg.txt for ABBAS_ASSI.jpg
[22:21:24] <matanya>	 true
[22:21:50] <Dereckson>	 So I followed your "Description files are available too: append .txt to the images."
[22:22:05] <matanya>	 ah, i have them as separated files, sorry
[22:22:25] <matanya>	 they are such as  ABBAS_ASSI.txt
[22:22:32] <Dereckson>	 ok
[22:22:42] <matanya>	 i can change that if you wish
[22:22:57] <Dereckson>	 Yes, rename them, I'll resume the upload, with the correct description files.
[22:23:08] <Dereckson>	 But for the already uploaded files, you'll need to fix that manually.
[22:23:15] <matanya>	 ok
[22:23:28] <Dereckson>	 The script can't help: it only publishes a revision for new uploads, not for reuploads.
[22:23:41] <Reedy>	 !log Generating some more captchas again T159581
[22:23:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:23:47] <stashbot>	 T159581: The same CAPTCHA image is always used across platforms and refresh - https://phabricator.wikimedia.org/T159581
[22:32:25] <wikibugs_>	 06Operations, 10Gerrit: Decide how to support polygerrit - https://phabricator.wikimedia.org/T158479#3074214 (10Paladox) I've managed to fix it upstream at https://gerrit-review.googlesource.com/#/c/99004/ :)
[22:32:39] <icinga-wm>	 PROBLEM - puppet last run on wtp1022 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[22:35:09] <icinga-wm>	 PROBLEM - check_puppetrun on db1025 is CRITICAL: CRITICAL: puppet fail
[22:40:09] <icinga-wm>	 PROBLEM - check_puppetrun on db1025 is CRITICAL: CRITICAL: puppet fail
[22:40:53] <Zppix>	 ^^ is that intentional?
[22:41:29] <Zppix>	 jouncebot: refresh
[22:41:36] <jouncebot>	 I refreshed my knowledge about deployments.
[22:44:52] <matanya>	 Dereckson: if i delete the pages, will it re-upload ?
[22:45:09] <icinga-wm>	 PROBLEM - check_puppetrun on db1025 is CRITICAL: CRITICAL: puppet fail
[22:45:37] <Dereckson>	 matanya: er yes
[22:45:57] <matanya>	 so i prefer this way, if you don't mind
[22:46:32] * Dereckson nods
[22:46:44] <Dereckson>	 will be faster indeed
[22:47:41] <matanya>	 doing shortly
[22:50:09] <icinga-wm>	 PROBLEM - check_puppetrun on db1025 is CRITICAL: CRITICAL: puppet fail
[22:51:53] <Dereckson>	 I've added a -f to my curl alias, so it won't download 404 pages next time.
[22:55:09] <icinga-wm>	 PROBLEM - check_puppetrun on db1025 is CRITICAL: CRITICAL: puppet fail
[22:57:16] <matanya>	 thanks
[22:59:03] <Dereckson>	 matanya: I think I've successfully deleted them (with Special:Nuke)
[22:59:15] <matanya>	 yes, i can confirm that
[22:59:48] <matanya>	 please wait while the correct desc files are being re-created
[22:59:52] <Dereckson>	 okay, so I'm ready to reupload as soon as you've renamed *.txt to *.jpg.txt
[22:59:58] <matanya>	 a matter of a minute or two
[23:00:00] * Dereckson nods
[23:00:09] <icinga-wm>	 PROBLEM - check_puppetrun on db1025 is CRITICAL: CRITICAL: puppet fail
[23:00:39] <icinga-wm>	 RECOVERY - puppet last run on wtp1022 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures
[23:05:09] <icinga-wm>	 RECOVERY - check_puppetrun on db1025 is OK: OK: Puppet is currently enabled, last run 297 seconds ago with 0 failures
[23:09:37] <wikibugs_>	 06Operations, 10MediaWiki-JobQueue, 10Wikidata: Job queue rising to nearly 3 million jobs - https://phabricator.wikimedia.org/T159618#3074250 (10Emijrp) Done. I added a put_throttle = 3 seconds. Anyway I think I will wait for the job queue to go down.
[23:11:58] <wikibugs_>	 (03PS1) 10Brian Wolff: Change account creation throttle for idwiki to default (6) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/341263
[23:18:49] <matanya>	 Dereckson: done
[23:18:58] <matanya>	 took longer than i thought it would
[23:21:48] <Dereckson>	 k
[23:22:51] <matanya>	 Dereckson: i see another issue
[23:22:58] <matanya>	 did you start ?
[23:23:19] <Dereckson>	 Not yet, you can still fix it.
[23:26:39] <matanya>	 ok Dereckson it is a go
[23:26:54] <matanya>	 i hope i spotted them all
[23:29:05] <Dereckson>	 ok
[23:32:43] <Dereckson>	 https://commons.wikimedia.org/wiki/File:YEHEZKEL_HAREL.jpg #better
[23:34:36] <Dereckson>	 matanya: strange, after https://phabricator.wikimedia.org/T159650#3074285 it stopped oO
[23:35:25] <matanya>	 Dereckson: any error message ?
[23:35:58] <Dereckson>	 nope
[23:36:41] <matanya>	 maybe just try again ?
[23:36:47] <Dereckson>	 I've rm the uploaded and launched again, yes
[23:42:51] <Dereckson>	 matanya: I confirm my log is full blue, so all deleted files have been reuploaded
[23:43:07] <matanya>	 thank you so much!
[23:43:33] <matanya>	 for your help, your patience and kindness :) 
[23:43:39] <Dereckson>	 You're welcome.