[02:23:20] PROBLEM - Old JVM GC check - cloudelastic1002-cloudelastic-chi-eqiad on cloudelastic1002 is CRITICAL: 112.9 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad [02:26:30] PROBLEM - Old JVM GC check - cloudelastic1001-cloudelastic-chi-eqiad on cloudelastic1001 is CRITICAL: 107.8 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad [02:32:16] PROBLEM - Old JVM GC check - cloudelastic1003-cloudelastic-chi-eqiad on cloudelastic1003 is CRITICAL: 116.9 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad [04:32:54] PROBLEM - Old JVM GC check - cloudelastic1004-cloudelastic-chi-eqiad on cloudelastic1004 is CRITICAL: 107.6 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad [04:59:28] PROBLEM - Debian mirror in sync with upstream on sodium is CRITICAL: /srv/mirrors/debian is over 14 hours old. https://wikitech.wikimedia.org/wiki/Mirrors [05:12:30] PROBLEM - Old JVM GC check - cloudelastic1004-cloudelastic-chi-eqiad on cloudelastic1004 is CRITICAL: 100.7 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad [05:29:10] RECOVERY - Old JVM GC check - cloudelastic1004-cloudelastic-chi-eqiad on cloudelastic1004 is OK: (C)100 gt (W)80 gt 76.27 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad [06:02:52] RECOVERY - Old JVM GC check - cloudelastic1003-cloudelastic-chi-eqiad on cloudelastic1003 is OK: (C)100 gt (W)80 gt 78.31 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad [06:12:34] RECOVERY - Old JVM GC check - cloudelastic1002-cloudelastic-chi-eqiad on cloudelastic1002 is OK: (C)100 gt (W)80 gt 77.29 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad [06:30:54] RECOVERY - mediawiki originals uploads -hourly- for eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Swift/How_To https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=eqiad [06:32:32] RECOVERY - mediawiki originals uploads -hourly- for codfw on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Swift/How_To https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=codfw [07:20:06] RECOVERY - Old JVM GC check - cloudelastic1001-cloudelastic-chi-eqiad on cloudelastic1001 is OK: (C)100 gt (W)80 gt 78.62 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad [07:25:27] (03PS1) 10KartikMistry: apertium-eu-es: Fix FTBFS with apertium 3.6 [debs/contenttranslation/apertium-eu-es] - 10https://gerrit.wikimedia.org/r/582410 (https://phabricator.wikimedia.org/T247585) [08:29:59] (03CR) 10Elukey: admin: add more documentation to analytics posix groups (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/582064 (https://phabricator.wikimedia.org/T246578) (owner: 10Elukey) [08:46:01] (03PS3) 10Elukey: admin: add more documentation to analytics posix groups [puppet] - 10https://gerrit.wikimedia.org/r/582064 (https://phabricator.wikimedia.org/T246578) [08:46:03] (03PS2) 10Elukey: admin: deprecate piwik-roots [puppet] - 10https://gerrit.wikimedia.org/r/582066 (https://phabricator.wikimedia.org/T246578) [08:46:05] (03PS2) 10Elukey: admin: flag notebook-roots as deprecated [puppet] - 10https://gerrit.wikimedia.org/r/582068 (https://phabricator.wikimedia.org/T246578) [08:46:07] (03PS3) 10Elukey: admin: refactor eventlogging-related groups [puppet] - 10https://gerrit.wikimedia.org/r/582070 (https://phabricator.wikimedia.org/T246578) [08:46:09] (03PS3) 10Elukey: admin: use the *analytics_admins_members placeholder when possible [puppet] - 10https://gerrit.wikimedia.org/r/582071 (https://phabricator.wikimedia.org/T246578) [08:46:11] (03PS3) 10Elukey: admin: deprecate aqs-users [puppet] - 10https://gerrit.wikimedia.org/r/582072 (https://phabricator.wikimedia.org/T246578) [08:46:13] (03PS1) 10Elukey: statistics::sites::analytics: remove reference to statistics-web-users [puppet] - 10https://gerrit.wikimedia.org/r/582440 (https://phabricator.wikimedia.org/T246578) [08:46:50] (03CR) 10Elukey: "Jbond: added https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/582440/ before this code review :)" [puppet] - 10https://gerrit.wikimedia.org/r/582064 (https://phabricator.wikimedia.org/T246578) (owner: 10Elukey) [08:50:41] (03CR) 10jerkins-bot: [V: 04-1] statistics::sites::analytics: remove reference to statistics-web-users [puppet] - 10https://gerrit.wikimedia.org/r/582440 (https://phabricator.wikimedia.org/T246578) (owner: 10Elukey) [08:56:16] RECOVERY - Debian mirror in sync with upstream on sodium is OK: /srv/mirrors/debian is over 0 hours old. https://wikitech.wikimedia.org/wiki/Mirrors [09:01:06] "fatal: Could not read from remote repository." [09:01:12] sigh [09:01:36] (03CR) 10Elukey: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/582440 (https://phabricator.wikimedia.org/T246578) (owner: 10Elukey) [09:02:50] better now [12:21:36] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3064 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [12:31:50] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3064 is OK: HTTP OK: HTTP/1.0 200 OK - 22355 bytes in 0.257 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [14:45:08] 10Operations, 10Wikimedia-Mailing-lists: Please decom reading-wmf mailing list - https://phabricator.wikimedia.org/T248126 (10Aklapper) @dr0ptp4kt: Only thing I could imagine in Mailman is auto-unsubscribe due to recognizing bounces to an email address (not sure what's the threshold / config for that though).... [15:03:26] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3052 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [15:13:40] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3052 is OK: HTTP OK: HTTP/1.0 200 OK - 22351 bytes in 0.259 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [15:19:56] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3064 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [15:30:10] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3064 is OK: HTTP OK: HTTP/1.0 200 OK - 22354 bytes in 0.256 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [16:25:32] PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=swagger_check_citoid_cluster_eqiad site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [16:27:38] RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [20:08:26] (03PS1) 10JJMC89: Remove grants for tboverride and tboverride-account [mediawiki-config] - 10https://gerrit.wikimedia.org/r/582531 (https://phabricator.wikimedia.org/T241114) [20:09:50] PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=swagger_check_restbase_esams site=esams https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [20:09:52] (03PS2) 10JJMC89: Remove grants for tboverride and tboverride-account [mediawiki-config] - 10https://gerrit.wikimedia.org/r/582531 (https://phabricator.wikimedia.org/T241114) [20:11:54] RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [20:44:21] (03CR) 10Aklapper: "Patch should be abandoned (bit I lack permissions) as the files were removed in e1c82d0b103c59cbaaea1d7bf80b5f6c6319b941 and 477a379136ae4" [puppet] - 10https://gerrit.wikimedia.org/r/328466 (https://phabricator.wikimedia.org/T153816) (owner: 10Tim Landscheidt) [20:44:39] 10Operations, 10Puppet, 10Patch-For-Review: apache::static_site is not working - https://phabricator.wikimedia.org/T153816 (10Aklapper) 05Open→03Invalid The files were removed in e1c82d0b103c59cbaaea1d7bf80b5f6c6319b941 / 477a379136ae42cd26b5ed23a49088e2f3d7ee77 [20:49:40] (03CR) 10Aklapper: [C: 04-1] "Someone please abandon. Thanks." [puppet] - 10https://gerrit.wikimedia.org/r/328466 (https://phabricator.wikimedia.org/T153816) (owner: 10Tim Landscheidt) [20:53:44] 10Puppet, 10Toolforge, 10Patch-For-Review: role::puppetmaster::standalone clones Git repositories as gitpuppet, git-sync-upstream overwrites them as root - https://phabricator.wikimedia.org/T152059 (10Aklapper) In the meantime, * `labmon1001` was renamed to `cloudmetrics` in 95fcb029bef97b7422ceb9f7054d4e23b... [20:54:40] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3060 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [20:56:34] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3060 is OK: HTTP OK: HTTP/1.0 200 OK - 22359 bytes in 0.259 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [22:02:27] (03Abandoned) 10Jcrespo: apache: Fix some issues with apache::static_site [puppet] - 10https://gerrit.wikimedia.org/r/328466 (https://phabricator.wikimedia.org/T153816) (owner: 10Tim Landscheidt) [23:22:54] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp1085 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [23:28:56] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp1085 is OK: HTTP OK: HTTP/1.0 200 OK - 22312 bytes in 0.005 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server