[00:20:47] 3Release-Engineering, operations, MediaWiki-extensions-Translate, Wikimedia-Extension-setup: ca.wikimedia wiki - sidebar in French won't work... - https://phabricator.wikimedia.org/T88843#1023159 (10Krenair) [00:53:45] 3operations, Wikimedia-Bugzilla: analyze Bugzilla access logs - https://phabricator.wikimedia.org/T86859#1023187 (10Aklapper) [02:11:37] !log l10nupdate Synchronized php-1.25wmf15/cache/l10n: (no message) (duration: 00m 01s) [02:11:47] Logged the message, Master [02:12:44] !log LocalisationUpdate completed (1.25wmf15) at 2015-02-08 02:11:41+00:00 [02:12:48] Logged the message, Master [02:13:08] !log l10nupdate Synchronized php-1.25wmf16/cache/l10n: (no message) (duration: 00m 01s) [02:13:11] Logged the message, Master [02:14:15] !log LocalisationUpdate completed (1.25wmf16) at 2015-02-08 02:13:11+00:00 [02:14:18] Logged the message, Master [03:35:50] PROBLEM - puppet last run on mw1139 is CRITICAL: CRITICAL: puppet fail [03:37:30] PROBLEM - puppet last run on amssq62 is CRITICAL: CRITICAL: Puppet has 1 failures [03:37:50] PROBLEM - puppet last run on analytics1018 is CRITICAL: CRITICAL: Puppet has 1 failures [03:38:40] PROBLEM - puppet last run on wtp1001 is CRITICAL: CRITICAL: Puppet has 1 failures [03:48:30] PROBLEM - puppet last run on cp1051 is CRITICAL: CRITICAL: Puppet has 1 failures [03:50:31] PROBLEM - puppet last run on platinum is CRITICAL: CRITICAL: Puppet has 1 failures [03:52:00] PROBLEM - puppet last run on es1008 is CRITICAL: CRITICAL: Puppet has 1 failures [03:53:53] !log LocalisationUpdate ResourceLoader cache refresh completed at Sun Feb 8 03:52:49 UTC 2015 (duration 52m 48s) [03:53:59] Logged the message, Master [03:55:39] PROBLEM - puppet last run on db1028 is CRITICAL: CRITICAL: Puppet has 1 failures [03:58:30] RECOVERY - puppet last run on wtp1001 is OK: OK: Puppet is currently enabled, last run 0 seconds ago with 0 failures [03:58:40] PROBLEM - puppet last run on thallium is CRITICAL: CRITICAL: Puppet has 1 failures [03:58:41] RECOVERY - puppet last run on analytics1018 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [03:59:40] RECOVERY - puppet last run on amssq62 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [04:00:01] PROBLEM - puppet last run on osmium is CRITICAL: CRITICAL: Puppet has 1 failures [04:00:09] PROBLEM - puppet last run on rdb1002 is CRITICAL: CRITICAL: Puppet has 1 failures [04:00:39] PROBLEM - puppet last run on analytics1038 is CRITICAL: CRITICAL: Puppet has 1 failures [04:00:40] PROBLEM - puppet last run on mw1013 is CRITICAL: CRITICAL: Puppet has 1 failures [04:01:00] PROBLEM - puppet last run on mw1160 is CRITICAL: CRITICAL: Puppet has 1 failures [04:01:00] RECOVERY - puppet last run on cp1051 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [04:01:01] PROBLEM - puppet last run on mw1124 is CRITICAL: CRITICAL: Puppet has 1 failures [04:01:20] PROBLEM - puppet last run on mw1221 is CRITICAL: CRITICAL: Puppet has 1 failures [04:01:20] PROBLEM - puppet last run on mw1145 is CRITICAL: CRITICAL: Puppet has 1 failures [04:01:49] RECOVERY - puppet last run on mw1013 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [04:01:50] PROBLEM - puppet last run on cp3014 is CRITICAL: CRITICAL: Puppet has 1 failures [04:01:50] PROBLEM - puppet last run on mw1250 is CRITICAL: CRITICAL: Puppet has 1 failures [04:02:09] RECOVERY - puppet last run on mw1139 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [04:02:09] PROBLEM - puppet last run on mw1099 is CRITICAL: CRITICAL: Puppet has 1 failures [04:02:10] RECOVERY - puppet last run on mw1124 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [04:02:20] RECOVERY - puppet last run on mw1221 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [04:03:10] RECOVERY - puppet last run on platinum is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [04:03:20] RECOVERY - puppet last run on mw1145 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [04:03:30] RECOVERY - puppet last run on es1008 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [04:04:00] RECOVERY - puppet last run on mw1250 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [04:04:11] RECOVERY - puppet last run on mw1160 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [04:05:10] RECOVERY - puppet last run on mw1099 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [04:06:10] RECOVERY - puppet last run on db1028 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [04:07:00] RECOVERY - puppet last run on cp3014 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [04:07:59] RECOVERY - puppet last run on analytics1038 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [04:09:11] RECOVERY - puppet last run on thallium is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [04:09:20] RECOVERY - puppet last run on osmium is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [04:11:30] RECOVERY - puppet last run on rdb1002 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [05:21:29] PROBLEM - puppet last run on stat1001 is CRITICAL: CRITICAL: Puppet has 1 failures [05:39:10] RECOVERY - puppet last run on stat1001 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [05:43:10] 3Continuous-Integration, operations: Jenkins is using php-luasandbox 1.9-1 for zend unit tests; precise should be upgraded to 2.0-7+wmf2.1 or equivalent - https://phabricator.wikimedia.org/T88798#1023260 (10greg) p:5Triage>3Normal [06:28:19] PROBLEM - puppet last run on mw1061 is CRITICAL: CRITICAL: Puppet has 2 failures [06:28:20] PROBLEM - puppet last run on cp3003 is CRITICAL: CRITICAL: Puppet has 3 failures [06:28:29] PROBLEM - puppet last run on amssq46 is CRITICAL: CRITICAL: puppet fail [06:28:29] PROBLEM - puppet last run on cp3016 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:00] PROBLEM - puppet last run on ms-fe2001 is CRITICAL: CRITICAL: Puppet has 1 failures [06:45:20] RECOVERY - puppet last run on cp3016 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [06:45:51] RECOVERY - puppet last run on ms-fe2001 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [06:46:10] RECOVERY - puppet last run on mw1061 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [06:46:20] RECOVERY - puppet last run on cp3003 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [06:48:39] RECOVERY - puppet last run on amssq46 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [07:19:44] (03PS1) 10Ori.livneh: vbench: use deferreds to avoid races and to be idiomatic [puppet] - 10https://gerrit.wikimedia.org/r/189305 [07:19:46] 3Multimedia, operations, MediaWiki-extensions-UploadWizard: Enable Extension:UploadWizard on id.wikipedia - https://phabricator.wikimedia.org/T88918#1023295 (10Kenrick95) 3NEW [09:26:00] PROBLEM - puppet last run on mw1140 is CRITICAL: CRITICAL: Puppet has 1 failures [09:40:43] 3Release-Engineering, operations, MediaWiki-extensions-Translate, Wikimedia-Extension-setup: ca.wikimedia wiki - sidebar in French won't work... - https://phabricator.wikimedia.org/T88843#1023346 (10Nemo_bis) Hi Benoit, this is not an issue with Translate but with MediaWiki core caching of the sidebar: [[MediaW... [09:42:50] RECOVERY - puppet last run on mw1140 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [10:29:17] (03PS1) 1001tonythomas: Un-subscribe frequently failing recipients [mediawiki-config] - 10https://gerrit.wikimedia.org/r/189316 (https://phabricator.wikimedia.org/T48640) [11:15:58] 3operations: Increasing cache time - https://phabricator.wikimedia.org/T86505#1023430 (10hoo) >>! In T86505#1023258, @Mjbmr wrote: > @hoo Can we ask wikimedia or we ourself or I myself change varnish configuration? is there a public repos for that? you did not respond to my last comment. Feel free to open a bug... [11:16:16] m( [11:16:20] phabricator -.- [11:16:33] 3operations: Increasing cache time - https://phabricator.wikimedia.org/T86505#1023435 (10hoo) 5Open>3Invalid [12:37:20] PROBLEM - puppet last run on amssq31 is CRITICAL: CRITICAL: Puppet has 1 failures [12:55:20] RECOVERY - puppet last run on amssq31 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [13:07:39] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 3 below the confidence bounds [13:45:40] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 11 data above and 6 below the confidence bounds [13:59:54] (03CR) 10Nemo bis: "What's the default in mailman and in our install? http://www.list.org/mailman-admin/node25.html" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/189316 (https://phabricator.wikimedia.org/T48640) (owner: 1001tonythomas) [14:10:01] (03CR) 10John F. Lewis: "@Nemo_bis the threshold in mailman by default is 5 and this is unmodified for Wikimedia's installation." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/189316 (https://phabricator.wikimedia.org/T48640) (owner: 1001tonythomas) [14:38:17] (03PS1) 10Merlijn van Deen: Add documentation link to 'create bug by email' text. [puppet] - 10https://gerrit.wikimedia.org/r/189326 (https://phabricator.wikimedia.org/T865) [14:40:21] PROBLEM - Slow CirrusSearch query rate on fluorine is CRITICAL: CirrusSearch-slow.log_line_rate CRITICAL: 0.00333333333333 [14:44:41] (03PS1) 10Merlijn van Deen: Change 'Export to Excel' to 'Export (disabled)' [puppet] - 10https://gerrit.wikimedia.org/r/189327 (https://phabricator.wikimedia.org/T152) [14:45:21] RECOVERY - Slow CirrusSearch query rate on fluorine is OK: CirrusSearch-slow.log_line_rate OKAY: 0.0 [14:49:02] (03PS2) 10Merlijn van Deen: Change 'Export to Excel' to 'Export (disabled)' [puppet] - 10https://gerrit.wikimedia.org/r/189327 (https://phabricator.wikimedia.org/T152) [14:53:05] (03PS1) 10Merlijn van Deen: Change Blocking Tasks to 'Blocked By' Tasks [puppet] - 10https://gerrit.wikimedia.org/r/189329 (https://phabricator.wikimedia.org/T33) [14:59:10] PROBLEM - puppet last run on elastic1025 is CRITICAL: CRITICAL: Puppet has 1 failures [15:09:29] PROBLEM - HTTP on dataset1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:13:39] RECOVERY - HTTP on dataset1001 is OK: HTTP OK: HTTP/1.1 200 OK - 5114 bytes in 9.516 second response time [15:15:49] RECOVERY - puppet last run on elastic1025 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [15:16:49] PROBLEM - HTTP on dataset1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:17:29] PROBLEM - ElasticSearch health check for shards on logstash1003 is CRITICAL: CRITICAL - elasticsearch http://10.64.32.136:9200/_cluster/health error while fetching: Request timed out. [15:19:40] RECOVERY - HTTP error ratio anomaly detection on graphite1001 is OK: OK: No anomaly detected [15:22:09] RECOVERY - HTTP on dataset1001 is OK: HTTP OK: HTTP/1.1 200 OK - 5114 bytes in 6.542 second response time [15:22:49] RECOVERY - ElasticSearch health check for shards on logstash1003 is OK: OK - elasticsearch status production-logstash-eqiad: status: green, number_of_nodes: 3, unassigned_shards: 0, timed_out: False, active_primary_shards: 45, cluster_name: production-logstash-eqiad, relocating_shards: 0, active_shards: 135, initializing_shards: 0, number_of_data_nodes: 3 [15:25:59] PROBLEM - ElasticSearch health check for shards on logstash1003 is CRITICAL: CRITICAL - elasticsearch http://10.64.32.136:9200/_cluster/health error while fetching: Request timed out. [15:26:59] RECOVERY - ElasticSearch health check for shards on logstash1003 is OK: OK - elasticsearch status production-logstash-eqiad: status: yellow, number_of_nodes: 3, unassigned_shards: 1, timed_out: False, active_primary_shards: 45, cluster_name: production-logstash-eqiad, relocating_shards: 0, active_shards: 133, initializing_shards: 1, number_of_data_nodes: 3 [15:30:59] (03CR) 10Nemo bis: [C: 031] Change 'Export to Excel' to 'Export (disabled)' [puppet] - 10https://gerrit.wikimedia.org/r/189327 (https://phabricator.wikimedia.org/T152) (owner: 10Merlijn van Deen) [15:39:19] PROBLEM - puppet last run on labsdb1005 is CRITICAL: CRITICAL: Puppet has 1 failures [15:57:09] RECOVERY - puppet last run on labsdb1005 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [16:18:51] (03Abandoned) 10Tim Landscheidt: Tools: Use toollabs::hba in toollabs::webnode [puppet] - 10https://gerrit.wikimedia.org/r/145388 (owner: 10Tim Landscheidt) [16:44:09] I'm seeing absolutely no response when uploading this video file through UploadWizard [16:47:31] And nothing in logstash [16:47:56] And I have little clue about how to navigate graphite, but my attempts have gotten me nothing [17:34:16] <_joe_> marktraceur: you see nothing in logstash because logstash logging is disabled for now [17:34:22] Oh. [17:34:25] That would explain it [17:34:31] <_joe_> (it's related to friday's outage) [17:34:50] <_joe_> https://wikitech.wikimedia.org/wiki/Incident_documentation/20150205-SiteOutage [17:35:44] 3operations, Project-Creators, Phabricator, Triagers: Broaden the group of users that can create projects in Phabricator - https://phabricator.wikimedia.org/T706#1023648 (10awight) What can we people without this permission do to group tasks together? I suppose I'll set tasks to block an umbrella, tracking task... [17:52:08] YuviPanda: ping :) [17:52:49] why is he even connected? [18:08:40] PROBLEM - Disk space on praseodymium is CRITICAL: DISK CRITICAL - free space: /mnt/data 11264 MB (3% inode=99%): [18:24:39] RECOVERY - Disk space on praseodymium is OK: DISK OK [19:02:00] PROBLEM - puppet last run on amssq52 is CRITICAL: CRITICAL: Puppet has 2 failures [19:02:29] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL: CRITICAL: 14.29% of data above the critical threshold [500.0] [19:15:09] RECOVERY - HTTP 5xx req/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [19:18:59] RECOVERY - puppet last run on amssq52 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [19:20:08] marktraceur: look in fluorine [20:43:19] PROBLEM - Disk space on xenon is CRITICAL: DISK CRITICAL - free space: /mnt/data 11328 MB (3% inode=99%): [21:28:34] 3operations, Wikimedia-Stream: stream.wikimedia.org: Uneven distribution of client connections on backends - https://phabricator.wikimedia.org/T69957#1023841 (10Andrew) p:5Triage>3Normal [21:28:39] PROBLEM - Disk space on xenon is CRITICAL: DISK CRITICAL - free space: /mnt/data 11177 MB (3% inode=99%): [21:30:53] 3operations, Analytics: Hadoop logs on logstash are being really spammy - https://phabricator.wikimedia.org/T87206#1023846 (10Andrew) p:5Triage>3Normal a:3Ottomata [21:33:31] 3operations: Retire Torrus - https://phabricator.wikimedia.org/T87840#1023864 (10Andrew) p:5Triage>3Normal [21:33:58] 3operations, Tool-Labs: Replag on labsdb - https://phabricator.wikimedia.org/T88183#1023868 (10Andrew) So, ok, it sounds like this is resolved? [21:34:09] 3operations, Tool-Labs: Replag on labsdb - https://phabricator.wikimedia.org/T88183#1023869 (10Andrew) p:5Triage>3High [21:40:50] (03Abandoned) 10Nemo bis: [gdash] Use logscale 10 for reqerror graph, again [puppet] - 10https://gerrit.wikimedia.org/r/117021 (https://bugzilla.wikimedia.org/41754) (owner: 10Nemo bis) [21:44:24] 3operations: Cannot use dsh-based restart of parsoid from tin anymore - https://phabricator.wikimedia.org/T87803#1023889 (10Andrew) p:5Triage>3Normal [21:47:09] 3operations: Document Debian/Ubuntu security update procedure & command - https://phabricator.wikimedia.org/T88469#1023891 (10Andrew) p:5Triage>3High [21:48:15] 3operations, MediaWiki-Core-Team: Store unsampled API and XFF logs - https://phabricator.wikimedia.org/T88393#1023892 (10Andrew) p:5Triage>3Normal [21:50:06] 3operations: Make Puppet repository pass lenient and strict lint checks - https://phabricator.wikimedia.org/T87132#1023895 (10Andrew) p:5Triage>3Normal a:3Andrew [21:50:31] 3operations: 503 on http://wikiversity.org/ - https://phabricator.wikimedia.org/T88774#1023897 (10Andrew) p:5Triage>3Normal [21:51:31] 3operations, ops-eqiad: cp1070 hardware failure - https://phabricator.wikimedia.org/T88889#1023899 (10Andrew) p:5Triage>3High [21:51:49] PROBLEM - Disk space on xenon is CRITICAL: DISK CRITICAL - free space: /mnt/data 11262 MB (3% inode=99%): [21:57:04] (03PS2) 10Nemo bis: Enable Collection by default on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/165490 (https://bugzilla.wikimedia.org/71416) (owner: 10Reedy) [21:59:45] (03CR) 10Nemo bis: "Rebased" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/165490 (https://bugzilla.wikimedia.org/71416) (owner: 10Reedy) [22:04:29] PROBLEM - Disk space on xenon is CRITICAL: DISK CRITICAL - free space: /mnt/data 11320 MB (3% inode=99%): [22:12:45] (03PS3) 10Nemo bis: Rsyncing slow-parse logs from fluorine to dumps.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/49678 (owner: 10Ottomata) [22:12:51] (03CR) 10Nemo bis: "Rebased" [puppet] - 10https://gerrit.wikimedia.org/r/49678 (owner: 10Ottomata) [22:39:20] PROBLEM - Disk space on praseodymium is CRITICAL: DISK CRITICAL - free space: /mnt/data 11247 MB (3% inode=99%): [23:07:30] RECOVERY - Disk space on xenon is OK: DISK OK [23:19:20] RECOVERY - Disk space on praseodymium is OK: DISK OK [23:27:19] PROBLEM - HTTP on dataset1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds