[00:00:52] (03PS10) 10Yuvipanda: tools: Use docker engine profile in tools builder [puppet] - 10https://gerrit.wikimedia.org/r/335299 [00:09:19] (03PS6) 10Dzahn: graphoid/gridengine/grub/haproxy/hhvm lint fixes [puppet] - 10https://gerrit.wikimedia.org/r/334319 (https://phabricator.wikimedia.org/T93645) (owner: 10Juniorsys) [00:38:06] PROBLEM - puppet last run on ganeti1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [00:42:12] (03CR) 10Dzahn: [C: 032] "http://puppet-compiler.wmflabs.org/5344/" [puppet] - 10https://gerrit.wikimedia.org/r/334319 (https://phabricator.wikimedia.org/T93645) (owner: 10Juniorsys) [00:48:15] (03PS6) 10Dzahn: mysql: Linting changes [puppet] - 10https://gerrit.wikimedia.org/r/334298 (https://phabricator.wikimedia.org/T93645) (owner: 10Juniorsys) [00:48:24] (03CR) 10Dzahn: [C: 032] mysql: Linting changes [puppet] - 10https://gerrit.wikimedia.org/r/334298 (https://phabricator.wikimedia.org/T93645) (owner: 10Juniorsys) [01:06:06] RECOVERY - puppet last run on ganeti1003 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [01:36:08] (03PS1) 10Yuvipanda: tools: Use docker profile for k8s worker nodes [puppet] - 10https://gerrit.wikimedia.org/r/335957 [01:37:36] PROBLEM - puppet last run on elastic1029 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [01:57:06] (03PS2) 10Yuvipanda: tools: Use docker profile for k8s worker nodes [puppet] - 10https://gerrit.wikimedia.org/r/335957 [01:58:52] (03PS3) 10Yuvipanda: tools: Use docker profile for k8s worker nodes [puppet] - 10https://gerrit.wikimedia.org/r/335957 [02:01:39] (03PS4) 10Yuvipanda: tools: Use docker profile for k8s worker nodes [puppet] - 10https://gerrit.wikimedia.org/r/335957 [02:06:36] RECOVERY - puppet last run on elastic1029 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [02:08:06] PROBLEM - All k8s worker nodes are healthy on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/k8s/nodes/ready - 185 bytes in 0.133 second response time [02:11:06] RECOVERY - All k8s worker nodes are healthy on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.165 second response time [02:23:10] !log l10nupdate@tin scap sync-l10n completed (1.29.0-wmf.10) (duration: 07m 53s) [02:23:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:28:33] !log l10nupdate@tin ResourceLoader cache refresh completed at Sat Feb 4 02:28:33 UTC 2017 (duration 5m 23s) [02:28:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:47:12] (03PS5) 10Yuvipanda: tools: Use docker profile for k8s worker nodes [puppet] - 10https://gerrit.wikimedia.org/r/335957 [03:04:16] PROBLEM - puppet last run on cp3049 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [03:13:47] PROBLEM - puppet last run on mc1007 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [03:25:00] (03PS6) 10Yuvipanda: tools: Use docker profile for k8s worker nodes [puppet] - 10https://gerrit.wikimedia.org/r/335957 [03:32:16] RECOVERY - puppet last run on cp3049 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [03:41:46] RECOVERY - puppet last run on mc1007 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [03:56:11] (03PS1) 10Yuvipanda: tools: Fix puppet on docker builder hosts [puppet] - 10https://gerrit.wikimedia.org/r/335970 [03:57:32] (03CR) 10jerkins-bot: [V: 04-1] tools: Fix puppet on docker builder hosts [puppet] - 10https://gerrit.wikimedia.org/r/335970 (owner: 10Yuvipanda) [03:57:56] (03PS2) 10Yuvipanda: tools: Fix puppet on docker builder hosts [puppet] - 10https://gerrit.wikimedia.org/r/335970 [04:07:49] (03PS11) 10Yuvipanda: tools: Use docker engine profile in tools builder [puppet] - 10https://gerrit.wikimedia.org/r/335299 [04:07:51] (03PS7) 10Yuvipanda: tools: Use docker profile for k8s worker nodes [puppet] - 10https://gerrit.wikimedia.org/r/335957 [04:07:53] (03PS3) 10Yuvipanda: tools: Fix puppet on docker builder hosts [puppet] - 10https://gerrit.wikimedia.org/r/335970 [04:09:36] PROBLEM - mailman I/O stats on fermium is CRITICAL: CRITICAL - I/O stats: Transfers/Sec=311.70 Read Requests/Sec=1795.50 Write Requests/Sec=5.90 KBytes Read/Sec=23867.60 KBytes_Written/Sec=2372.40 [04:16:24] (03PS12) 10Yuvipanda: tools: Use docker engine profile in tools builder [puppet] - 10https://gerrit.wikimedia.org/r/335299 [04:16:27] (03PS8) 10Yuvipanda: tools: Use docker profile for k8s worker nodes [puppet] - 10https://gerrit.wikimedia.org/r/335957 [04:16:28] (03PS4) 10Yuvipanda: tools: Fix puppet on docker builder hosts [puppet] - 10https://gerrit.wikimedia.org/r/335970 [04:20:06] (03PS1) 10Dzahn: wikistats: cron for automatic miraheze table update [puppet] - 10https://gerrit.wikimedia.org/r/335971 [04:20:36] RECOVERY - mailman I/O stats on fermium is OK: OK - I/O stats: Transfers/Sec=3.80 Read Requests/Sec=0.70 Write Requests/Sec=19.70 KBytes Read/Sec=2.80 KBytes_Written/Sec=2709.20 [04:20:53] (03PS1) 10Yuvipanda: tools: Turn on docker live-migrate for docker builder [puppet] - 10https://gerrit.wikimedia.org/r/335972 (https://phabricator.wikimedia.org/T157180) [04:20:55] (03CR) 10Dzahn: [C: 04-1] "WIP" [puppet] - 10https://gerrit.wikimedia.org/r/335971 (owner: 10Dzahn) [04:21:06] (03CR) 10jerkins-bot: [V: 04-1] wikistats: cron for automatic miraheze table update [puppet] - 10https://gerrit.wikimedia.org/r/335971 (owner: 10Dzahn) [04:21:33] (03PS2) 10Dzahn: wikistats: cron for automatic miraheze table update [puppet] - 10https://gerrit.wikimedia.org/r/335971 (https://phabricator.wikimedia.org/T153930) [04:21:48] (03CR) 10Dzahn: [C: 04-2] wikistats: cron for automatic miraheze table update [puppet] - 10https://gerrit.wikimedia.org/r/335971 (https://phabricator.wikimedia.org/T153930) (owner: 10Dzahn) [04:22:32] (03CR) 10jerkins-bot: [V: 04-1] wikistats: cron for automatic miraheze table update [puppet] - 10https://gerrit.wikimedia.org/r/335971 (https://phabricator.wikimedia.org/T153930) (owner: 10Dzahn) [04:23:50] (03PS3) 10Dzahn: wikistats: cron for automatic miraheze table update (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/335971 (https://phabricator.wikimedia.org/T153930) [04:24:46] (03CR) 10jerkins-bot: [V: 04-1] wikistats: cron for automatic miraheze table update (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/335971 (https://phabricator.wikimedia.org/T153930) (owner: 10Dzahn) [04:25:35] (03PS4) 10Dzahn: wikistats: cron for automatic miraheze table update (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/335971 [04:27:07] (03PS5) 10Dzahn: wikistats: cron for automatic miraheze table update (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/335971 (https://phabricator.wikimedia.org/T153930) [04:44:32] 06Operations, 10MediaWiki-extensions-PageAssessments: PageAssessments can't work on enwiki - https://phabricator.wikimedia.org/T157179#2998272 (10TTO) [04:45:05] 06Operations, 10MediaWiki-extensions-PageAssessments: Special:PageAssessments times out on enwiki - https://phabricator.wikimedia.org/T157179#2998206 (10TTO) [04:48:40] 06Operations, 10MediaWiki-extensions-PageAssessments: Special:PageAssessments times out on enwiki - https://phabricator.wikimedia.org/T157179#2998206 (10JustBerry) Reference material: https://wikiapiary.com/wiki/Extension:PageAssessments Wikipedia (en) 1.29.0-wmf.9 (MediaWiki version) 1.1.0 (Extension name) [04:51:03] 06Operations, 10MediaWiki-extensions-PageAssessments: Special:PageAssessments times out on enwiki - https://phabricator.wikimedia.org/T157179#2998206 (10JustBerry) Added extension authors as subscribers. [04:57:49] 06Operations, 06Community-Tech, 10MediaWiki-extensions-PageAssessments: Special:PageAssessments times out on enwiki - https://phabricator.wikimedia.org/T157179#2998299 (10Niharika) Thanks for reporting, @shizhao. {T156198} seems related. [05:00:14] 06Operations, 06Community-Tech, 10MediaWiki-extensions-PageAssessments: Special:PageAssessments times out on enwiki - https://phabricator.wikimedia.org/T157179#2998303 (10JustBerry) @Niharika T156198 was marked as resolved in Jan, despite persisting problems even after patches. @kaldari May need another review? [05:04:06] (03PS1) 10Yuvipanda: tools: Remove unused docker related files [puppet] - 10https://gerrit.wikimedia.org/r/335974 [05:06:05] 06Operations, 06Community-Tech, 10MediaWiki-extensions-PageAssessments: Special:PageAssessments times out on enwiki - https://phabricator.wikimedia.org/T157179#2998317 (10Niharika) [05:06:36] 06Operations, 06Community-Tech, 10MediaWiki-extensions-PageAssessments: Special:PageAssessments times out on enwiki - https://phabricator.wikimedia.org/T157179#2998320 (10kaldari) [05:08:40] (03PS1) 10Yuvipanda: tools: Update infrastructure message to match reality [puppet] - 10https://gerrit.wikimedia.org/r/335975 [05:10:04] (03CR) 10Yuvipanda: [C: 032] tools: Update infrastructure message to match reality [puppet] - 10https://gerrit.wikimedia.org/r/335975 (owner: 10Yuvipanda) [05:21:56] PROBLEM - puppet last run on mw1296 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [05:22:46] PROBLEM - puppet last run on labvirt1007 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [05:50:56] RECOVERY - puppet last run on mw1296 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [05:51:06] RECOVERY - puppet last run on labvirt1007 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [06:00:06] PROBLEM - puppet last run on snapshot1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:03:16] PROBLEM - Router interfaces on cr1-codfw is CRITICAL: CRITICAL: host 208.80.153.192, interfaces up: 120, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-5/0/2: down - Core: cr2-ulsfo:xe-1/3/0 (Zayo, OGYX/124337//ZYO, 38.8ms) {#11541} [10Gbps wave]BR [06:03:26] PROBLEM - Router interfaces on cr2-ulsfo is CRITICAL: CRITICAL: host 198.35.26.193, interfaces up: 76, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-1/3/0: down - Core: cr1-codfw:xe-5/0/2 (Zayo, OGYX/124337//ZYO, 38.8ms) {#?} [10Gbps wave]BR [06:05:56] PROBLEM - High load average on labstore1003 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [24.0] [06:08:56] RECOVERY - High load average on labstore1003 is OK: OK: Less than 50.00% above the threshold [16.0] [06:09:56] PROBLEM - puppet last run on analytics1049 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:28:06] RECOVERY - puppet last run on snapshot1001 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [06:38:56] RECOVERY - puppet last run on analytics1049 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [06:50:56] PROBLEM - MD RAID on ms-be1012 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:54:56] PROBLEM - very high load average likely xfs on ms-be1012 is CRITICAL: CRITICAL - load average: 171.32, 107.52, 52.06 [06:55:06] PROBLEM - SSH on ms-be1012 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:55:16] PROBLEM - puppet last run on ms-be1012 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:55:16] PROBLEM - Check whether ferm is active by checking the default input chain on ms-be1012 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:55:26] PROBLEM - configured eth on ms-be1012 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:57:06] PROBLEM - DPKG on ms-be1012 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:59:06] RECOVERY - DPKG on ms-be1012 is OK: All packages OK [06:59:06] RECOVERY - SSH on ms-be1012 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.7 (protocol 2.0) [06:59:06] RECOVERY - puppet last run on ms-be1012 is OK: OK: Puppet is currently enabled, last run 19 minutes ago with 0 failures [06:59:06] RECOVERY - Check whether ferm is active by checking the default input chain on ms-be1012 is OK: OK ferm input default policy is set [06:59:16] RECOVERY - configured eth on ms-be1012 is OK: OK - interfaces up [07:03:56] RECOVERY - very high load average likely xfs on ms-be1012 is OK: OK - load average: 10.92, 70.57, 68.84 [07:06:16] RECOVERY - Router interfaces on cr1-codfw is OK: OK: host 208.80.153.192, interfaces up: 122, down: 0, dormant: 0, excluded: 0, unused: 0 [07:08:26] RECOVERY - Router interfaces on cr2-ulsfo is OK: OK: host 198.35.26.193, interfaces up: 78, down: 0, dormant: 0, excluded: 0, unused: 0 [07:12:06] PROBLEM - puppet last run on ms-be1012 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[xfs_label-/dev/sdn3] [07:49:18] 06Operations, 10ops-codfw, 10DBA: db2060 not accessible - https://phabricator.wikimedia.org/T156161#2998390 (10Marostegui) >>! In T156161#2997396, @Papaul wrote: > I will need a maintenance window set for this system on Monday from 10am to 1pm for the controller replacement. Thanks Thanks! No problem, I wil... [07:52:25] (03PS1) 10Marostegui: db-codfw.php: Depool db2060 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/335980 (https://phabricator.wikimedia.org/T156161) [09:09:51] !log Started nodetool-a cleanup on aqs1005 (after 1008-{ab} bootstraps) [09:09:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:25:46] PROBLEM - puppet last run on elastic1021 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:54:46] RECOVERY - puppet last run on elastic1021 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [10:06:36] 06Operations, 10Internet-Archive, 06Offline-Working-Group: Create backups of Wikimedia content in diverse geographic places - https://phabricator.wikimedia.org/T156544#2979128 (10faidon) We are working on our backup policy, but what is requested here -even the diverse geography, let alone the non-WMF-control... [10:08:16] 06Operations, 10Monitoring: monitor smart wearout indicators in icinga checks - https://phabricator.wikimedia.org/T157159#2998430 (10faidon) [10:08:18] 06Operations, 10Monitoring, 06Operations-Software-Development: monitor SSD wear levels - https://phabricator.wikimedia.org/T86556#2998433 (10faidon) [11:44:57] !log Started nodetool-a cleanup on aqs1008 (after 1008-{ab} bootstraps) [11:45:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:51:06] PROBLEM - puppet last run on stat1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [12:21:06] RECOVERY - puppet last run on stat1002 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [13:51:06] PROBLEM - puppet last run on analytics1052 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [14:04:43] (03PS1) 10Elukey: Add aqs1009-a to the AQS Cassandra cluster [puppet] - 10https://gerrit.wikimedia.org/r/335993 (https://phabricator.wikimedia.org/T155654) [14:06:09] (03PS2) 10Elukey: Add aqs1009-a to the AQS Cassandra cluster [puppet] - 10https://gerrit.wikimedia.org/r/335993 (https://phabricator.wikimedia.org/T155654) [14:07:12] (03PS3) 10Elukey: Add aqs1009-a to the AQS Cassandra cluster [puppet] - 10https://gerrit.wikimedia.org/r/335993 (https://phabricator.wikimedia.org/T155654) [14:20:06] RECOVERY - puppet last run on analytics1052 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [16:25:06] PROBLEM - puppet last run on graphite1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:54:06] RECOVERY - puppet last run on graphite1002 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [17:24:16] PROBLEM - Disk space on elastic1024 is CRITICAL: DISK CRITICAL - free space: /var/lib/elasticsearch 64999 MB (12% inode=99%) [17:32:44] 06Operations, 10DBA, 10Gerrit, 13Patch-For-Review, 07Upstream: Gerrit shows HTTP 500 error when pasting extended unicode characters - https://phabricator.wikimedia.org/T145885#2998632 (10Paladox) It turns out we can support this right now. By doing this [database] driver = org.mariadb.jdbc.Driver host... [17:47:50] (03PS1) 10Paladox: Add mariadb-java-client [debs/gerrit] - 10https://gerrit.wikimedia.org/r/336002 (https://phabricator.wikimedia.org/T145885) [17:48:15] (03CR) 10Paladox: "Upstream helped me find a way to support using this plugin :)" [debs/gerrit] - 10https://gerrit.wikimedia.org/r/336002 (https://phabricator.wikimedia.org/T145885) (owner: 10Paladox) [17:53:16] (03Draft1) 10Paladox: Gerrit: Use the mariadb plugin instead of mysql [puppet] - 10https://gerrit.wikimedia.org/r/336003 (https://phabricator.wikimedia.org/T145885) [17:53:16] PROBLEM - puppet last run on ms-be1024 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:53:20] (03PS2) 10Paladox: Gerrit: Use the mariadb plugin instead of mysql [puppet] - 10https://gerrit.wikimedia.org/r/336003 (https://phabricator.wikimedia.org/T145885) [17:55:20] 06Operations, 10DBA, 10Gerrit, 13Patch-For-Review, 07Upstream: Gerrit shows HTTP 500 error when pasting extended unicode characters - https://phabricator.wikimedia.org/T145885#2998648 (10Paladox) @demon @jcrespo @Marostegui Upstream helped me find a way we can support the mariadb plugin without needing t... [17:55:37] 06Operations, 10DBA, 10Gerrit, 13Patch-For-Review, 07Upstream: Gerrit shows HTTP 500 error when pasting extended unicode characters - https://phabricator.wikimedia.org/T145885#2998649 (10Paladox) p:05Low>03Normal [18:08:58] (03CR) 10Paladox: [C: 031] "Tested on my puppetmaster setup on gerrit-test3." [puppet] - 10https://gerrit.wikimedia.org/r/336003 (https://phabricator.wikimedia.org/T145885) (owner: 10Paladox) [18:09:10] (03CR) 10Paladox: "Tested on gerrit-test3 and works, see https://gerrit.git.wmflabs.org/r/#/c/1/14" [debs/gerrit] - 10https://gerrit.wikimedia.org/r/336002 (https://phabricator.wikimedia.org/T145885) (owner: 10Paladox) [18:22:16] RECOVERY - puppet last run on ms-be1024 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [18:38:36] (03PS1) 10Urbanecm: Namespace changes for elwikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336004 (https://phabricator.wikimedia.org/T157187) [18:41:04] (03CR) 10Aklapper: "I don't see reasons provided why this patch was written in this form - especially arguments why max_execution is changed from 10 to 15 and" [puppet] - 10https://gerrit.wikimedia.org/r/335714 (https://phabricator.wikimedia.org/T125357) (owner: 10Paladox) [18:41:56] (03CR) 10Paladox: "> I don't see reasons provided why this patch was written in this" [puppet] - 10https://gerrit.wikimedia.org/r/335714 (https://phabricator.wikimedia.org/T125357) (owner: 10Paladox) [19:04:52] 06Operations, 10DBA: Switchover s1 master db1057 -> db1052 - https://phabricator.wikimedia.org/T156008#2998703 (10Marostegui) >>! In T156008#2987013, @jcrespo wrote: > I chhanged the master of dbstore1001. Resolving now, but let's monitor dbstore1001 to make sure nothing broke (because its delayed replication... [19:08:50] 06Operations, 06Discovery, 06Discovery-Search, 10Elasticsearch, and 2 others: Setup a private elasticsearch cluster for phabricator - https://phabricator.wikimedia.org/T156939#2998714 (10Aklapper) > We need to setup a elastic search cluster for phabricator urgently as it seems mysql full text search query'... [19:13:16] PROBLEM - Disk space on elastic1024 is CRITICAL: DISK CRITICAL - free space: /var/lib/elasticsearch 64154 MB (12% inode=99%) [19:14:06] 06Operations, 06Discovery, 06Discovery-Search, 10Elasticsearch, and 2 others: Setup a private elasticsearch cluster for phabricator - https://phabricator.wikimedia.org/T156939#2998716 (10Paladox) >>! In T156939#2998714, @Aklapper wrote: >> We need to setup a elastic search cluster for phabricator urgently... [19:24:20] !log Started nodetool-b cleanup on aqs1005 (after 1008-{ab} bootstraps) [19:24:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:25:55] (03PS3) 10Paladox: Gerrit: Use the mariadb plugin instead of mysql [puppet] - 10https://gerrit.wikimedia.org/r/336003 (https://phabricator.wikimedia.org/T145885) [19:25:58] (03CR) 10Paladox: [C: 031] Gerrit: Use the mariadb plugin instead of mysql [puppet] - 10https://gerrit.wikimedia.org/r/336003 (https://phabricator.wikimedia.org/T145885) (owner: 10Paladox) [19:28:06] 06Operations, 10DBA, 10Gerrit, 13Patch-For-Review: Gerrit shows HTTP 500 error when pasting extended unicode characters - https://phabricator.wikimedia.org/T145885#2998732 (10Paladox) [19:31:16] PROBLEM - Disk space on elastic1024 is CRITICAL: DISK CRITICAL - free space: /var/lib/elasticsearch 63980 MB (12% inode=99%) [19:34:34] (03CR) 10Paladox: [C: 031] "Upstream helped me here https://gerrit-review.googlesource.com/#/c/95673/ :)" [puppet] - 10https://gerrit.wikimedia.org/r/336003 (https://phabricator.wikimedia.org/T145885) (owner: 10Paladox) [20:12:27] 06Operations, 06Discovery, 06Discovery-Search, 10Elasticsearch, and 2 others: Setup a private elasticsearch cluster for phabricator - https://phabricator.wikimedia.org/T156939#2998743 (10greg) 05Open>03Invalid >>! In T156939#2998716, @Paladox wrote: >>>! In T156939#2998714, @Aklapper wrote: >>> We need... [20:13:16] PROBLEM - Disk space on elastic1024 is CRITICAL: DISK CRITICAL - free space: /var/lib/elasticsearch 64580 MB (12% inode=99%) [21:01:56] PROBLEM - puppet last run on mw1264 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [21:07:36] PROBLEM - Redis replication status tcp_6479 on rdb2006 is CRITICAL: CRITICAL ERROR - Redis Library - can not ping 10.192.48.44 on port 6479 [21:08:26] RECOVERY - Redis replication status tcp_6479 on rdb2006 is OK: OK: REDIS 2.8.17 on 10.192.48.44:6479 has 1 databases (db0) with 3270553 keys, up 96 days 12 hours - replication_delay is 0 [21:09:56] PROBLEM - CirrusSearch eqiad 95th percentile latency on graphite1001 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [1000.0] [21:10:56] RECOVERY - CirrusSearch eqiad 95th percentile latency on graphite1001 is OK: OK: Less than 20.00% above the threshold [500.0] [21:13:05] (03CR) 10Hashar: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/299151 (owner: 10Hashar) [21:13:45] (03CR) 10jerkins-bot: [V: 04-1] zuul: rspec tests [puppet] - 10https://gerrit.wikimedia.org/r/299151 (owner: 10Hashar) [21:14:34] (03PS3) 10Hashar: labstore: check should search for exact mount match [puppet] - 10https://gerrit.wikimedia.org/r/333230 (https://phabricator.wikimedia.org/T155820) [21:16:16] PROBLEM - Disk space on elastic1024 is CRITICAL: DISK CRITICAL - free space: / 2354 MB (8% inode=90%): /var/lib/elasticsearch 84034 MB (16% inode=99%) [21:18:26] PROBLEM - Disk space on elastic1040 is CRITICAL: DISK CRITICAL - free space: / 0 MB (0% inode=90%) [21:18:56] PROBLEM - CirrusSearch eqiad 95th percentile latency on graphite1001 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [1000.0] [21:25:16] PROBLEM - puppet last run on cp1055 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [21:25:26] PROBLEM - Redis replication status tcp_6479 on rdb2006 is CRITICAL: CRITICAL: replication_delay is 642 600 - REDIS 2.8.17 on 10.192.48.44:6479 has 1 databases (db0) with 3270900 keys, up 96 days 13 hours - replication_delay is 642 [21:29:56] RECOVERY - puppet last run on mw1264 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [21:30:06] PROBLEM - Redis replication status tcp_6479 on rdb2005 is CRITICAL: CRITICAL ERROR - Redis Library - can not ping 10.192.32.133 on port 6479 [21:31:06] RECOVERY - Redis replication status tcp_6479 on rdb2005 is OK: OK: REDIS 2.8.17 on 10.192.32.133:6479 has 1 databases (db0) with 3271628 keys, up 96 days 13 hours - replication_delay is 0 [21:31:26] RECOVERY - Redis replication status tcp_6479 on rdb2006 is OK: OK: REDIS 2.8.17 on 10.192.48.44:6479 has 1 databases (db0) with 3271438 keys, up 96 days 13 hours - replication_delay is 0 [21:53:16] RECOVERY - puppet last run on cp1055 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [22:11:56] RECOVERY - CirrusSearch eqiad 95th percentile latency on graphite1001 is OK: OK: Less than 20.00% above the threshold [500.0] [22:53:56] PROBLEM - puppet last run on ocg1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:20:56] RECOVERY - puppet last run on ocg1001 is OK: OK: Puppet is currently enabled, last run 0 seconds ago with 0 failures