[00:18:28] (03PS1) 10GTirloni: shinken - Tweak Puppet thresholds [puppet] - 10https://gerrit.wikimedia.org/r/463581 (https://phabricator.wikimedia.org/T161898) [00:24:39] (03PS1) 10Zoranzoki21: Create Photowalk and Photowalk Talk namespaces for bd.wikimedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/463582 (https://phabricator.wikimedia.org/T205747) [00:28:05] 04Critical Alert for device cr2-eqdfw.wikimedia.org - Primary outbound port utilisation over 80% [00:28:08] 04Critical Alert for device cr2-codfw.wikimedia.org - Primary outbound port utilisation over 80% [00:28:12] 04Critical Alert for device cr2-eqdfw.wikimedia.org - Primary inbound port utilisation over 80% [00:32:53] (03PS3) 10Dzahn: mw_maintenance: temp hack to avoid duplicate crons on switch to eqiad [puppet] - 10https://gerrit.wikimedia.org/r/463563 (https://phabricator.wikimedia.org/T201343) [00:32:56] (03PS1) 10Zoranzoki21: Change acewiki default time zone to UTC+7 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/463584 (https://phabricator.wikimedia.org/T205693) [00:35:17] (03CR) 10Dzahn: [C: 04-2] mw_maintenance: temp hack to avoid duplicate crons on switch to eqiad [puppet] - 10https://gerrit.wikimedia.org/r/463563 (https://phabricator.wikimedia.org/T201343) (owner: 10Dzahn) [00:35:17] (03CR) 10Dzahn: [C: 04-2] "https://puppet-compiler.wmflabs.org/compiler1002/12677/mwmaint1001.eqiad.wmnet/change.mwmaint1001.eqiad.wmnet.err" [puppet] - 10https://gerrit.wikimedia.org/r/463563 (https://phabricator.wikimedia.org/T201343) (owner: 10Dzahn) [00:42:05] 04̶C̶r̶i̶t̶i̶c̶a̶l Device cr2-eqdfw.wikimedia.org recovered from Primary outbound port utilisation over 80% [00:42:08] 04̶C̶r̶i̶t̶i̶c̶a̶l Device cr2-eqdfw.wikimedia.org recovered from Primary inbound port utilisation over 80% [00:43:05] 04̶C̶r̶i̶t̶i̶c̶a̶l Device cr2-codfw.wikimedia.org recovered from Primary outbound port utilisation over 80% [01:13:07] !log krinkle@deploy1001 Synchronized php-1.32.0-wmf.23/extensions/ORES/includes/ORESService.php: T205651 - I1beaeab732a31d (duration: 00m 59s) [01:13:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:13:12] T205651: Unexpected error localisation for HTTP timeout in log message - https://phabricator.wikimedia.org/T205651 [01:31:23] PROBLEM - MariaDB Slave Lag: m3 on db2042 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 316.00 seconds [01:36:52] PROBLEM - MariaDB Slave Lag: m3 on db2042 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 335.00 seconds [01:37:24] 10Operations, 10Wikimedia-Mailing-lists: I get a "403 Forbidden" error when subscribing to a list - https://phabricator.wikimedia.org/T205694 (10Klein) I tried restarting the router since I have a dynamic IP but it's still the same thing. :/ [01:42:13] PROBLEM - MariaDB Slave Lag: m3 on db2042 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 339.00 seconds [02:03:23] PROBLEM - MediaWiki memcached error rate on graphite1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [5000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [02:07:42] RECOVERY - MediaWiki memcached error rate on graphite1001 is OK: OK: Less than 40.00% above the threshold [1000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [02:22:52] PROBLEM - MediaWiki memcached error rate on graphite1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [5000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [02:27:12] PROBLEM - MediaWiki memcached error rate on graphite1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [5000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [02:42:23] RECOVERY - MediaWiki memcached error rate on graphite1001 is OK: OK: Less than 40.00% above the threshold [1000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [02:57:53] PROBLEM - Host mr1-eqiad.oob is DOWN: PING CRITICAL - Packet loss = 100% [02:58:53] PROBLEM - Router interfaces on mr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.199, interfaces up: 35, down: 1, dormant: 0, excluded: 1, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [03:00:13] PROBLEM - Host mr1-eqiad.oob IPv6 is DOWN: CRITICAL - Destination Unreachable (2607:f6f0:205::153) [03:01:02] RECOVERY - Router interfaces on mr1-eqiad is OK: OK: host 208.80.154.199, interfaces up: 37, down: 0, dormant: 0, excluded: 1, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [03:03:03] RECOVERY - Host mr1-eqiad.oob is UP: PING OK - Packet loss = 0%, RTA = 2.83 ms [03:05:23] RECOVERY - Host mr1-eqiad.oob IPv6 is UP: PING OK - Packet loss = 0%, RTA = 6.35 ms [03:28:43] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 871.74 seconds [03:46:02] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 287.83 seconds [04:36:52] RECOVERY - MariaDB Slave Lag: m3 on db2042 is OK: OK slave_sql_lag Replication lag: 23.00 seconds [09:38:40] 10Operations, 10Wikimedia-Mailing-lists, 10Bengali-Sites: Set up mailing list for Bengali Wikibooks - https://phabricator.wikimedia.org/T203736 (10Aklapper) @Shahadat: To clarify, do these three email addresses belong to three different people? [11:03:57] 10Operations, 10Wikimedia-Mailing-lists, 10Bengali-Sites: Set up mailing list for Bengali Wikibooks - https://phabricator.wikimedia.org/T203736 (10Shahadat) No, same person. Do you need different email address from different person? [11:28:59] 10Operations, 10Wikimedia-Mailing-lists, 10Bengali-Sites: Set up mailing list for Bengali Wikibooks - https://phabricator.wikimedia.org/T203736 (10jayantanth) Hi, @Shahadat, althought you have initited this mailing list request, it will be better maintain three different person should be admin. So I would re... [13:01:56] !log tools-mail cleaned frozen messages in exim queue [13:01:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:38:50] (03PS1) 10GTirloni: tools-mail - Add strict rules against spam [puppet] - 10https://gerrit.wikimedia.org/r/463611 (https://phabricator.wikimedia.org/T202558) [13:43:34] gtirloni: hey :) [13:43:45] paravoid: hi! [13:44:06] great to see that you're looking at tools-mail :) [13:45:01] https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/148917/ and https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/237871/ may be of interest to you as well [13:45:20] (03CR) 10Urbanecm: [C: 031] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/463584 (https://phabricator.wikimedia.org/T205693) (owner: 10Zoranzoki21) [13:45:27] yeah, just confirmed these extra checks worked against our current spammy friends.. they should have a harder time soon [13:45:40] and https://phabricator.wikimedia.org/T41785 if you haven't seen it already [13:46:11] ah yeah, that should fix the overall situation in a much better way [13:46:12] not antispam-related, but if you're looking into tools-mail in general ;) [13:46:26] just putting a bandaid on tools-mail today :) [13:46:36] sure :) [13:46:53] but i'll check those changes, thanks.. b.rooke also sent me one about rate-limiting we were considering, i'll check that out soon [13:47:37] awesome [13:47:57] (03CR) 10Urbanecm: [C: 04-1] "See the comment." (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/463582 (https://phabricator.wikimedia.org/T205747) (owner: 10Zoranzoki21) [13:48:00] I'm not sure whether (and how) the mx-out efforts and tools-mail align, but maybe we should talk about it next week :) [13:48:31] yep, let's do that. i think there's a great overlap, glad to exchange ideas [13:49:44] 10Operations, 10Wikimedia-Mailing-lists, 10Bengali-Sites: Set up mailing list for Bengali Wikibooks - https://phabricator.wikimedia.org/T203736 (10Shahadat) [13:51:39] (03PS2) 10Urbanecm: Initial configuration for liwikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/463479 (https://phabricator.wikimedia.org/T205710) [13:51:52] 10Operations, 10Wikimedia-Mailing-lists, 10Bengali-Sites: Set up mailing list for Bengali Wikibooks - https://phabricator.wikimedia.org/T203736 (10Shahadat) @Aklapper , I have update this task with email address. Thanks. [13:52:26] (03PS3) 10Urbanecm: Initial configuration for liwikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/463479 (https://phabricator.wikimedia.org/T205710) [13:52:28] (03CR) 10GTirloni: [C: 032] "puppet-compiler checked a few dozen hosts without relevant issues - https://puppet-compiler.wmflabs.org/compiler1002/12678/" [puppet] - 10https://gerrit.wikimedia.org/r/463611 (https://phabricator.wikimedia.org/T202558) (owner: 10GTirloni) [13:52:34] 10Operations, 10Wikimedia-Mailing-lists, 10Bengali-Sites: Set up mailing list for Bengali Wikibooks - https://phabricator.wikimedia.org/T203736 (10Shahadat) [13:53:45] (03PS2) 10Urbanecm: Initial configuration for yuewiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/463482 (https://phabricator.wikimedia.org/T205546) [13:54:49] (03CR) 10jerkins-bot: [V: 04-1] Initial configuration for yuewiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/463482 (https://phabricator.wikimedia.org/T205546) (owner: 10Urbanecm) [13:56:24] (03PS3) 10Urbanecm: Initial configuration for yuewiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/463482 (https://phabricator.wikimedia.org/T205546) [14:00:53] (03PS3) 10MSantos: Fix: Regenerate map tiles up to zoom level 9 [puppet] - 10https://gerrit.wikimedia.org/r/463542 (https://phabricator.wikimedia.org/T202201) [14:38:56] could someone merge this when you have a chance? https://gerrit.wikimedia.org/r/c/integration/config/+/463615 [14:39:01] (if it looks right) [14:39:20] basically I want to call `npm test` [14:46:02] Looks right [14:49:00] davidwbarratt: done [14:49:20] Reedy yay! thanks! [15:36:51] I'm assuming that node 6 is the newest version of node available on Jenkins? [16:27:38] (03PS2) 10Zoranzoki21: Create Photowalk and Photowalk Talk namespaces for bd.wikimedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/463582 (https://phabricator.wikimedia.org/T205747) [16:28:38] (03CR) 10Zoranzoki21: Create Photowalk and Photowalk Talk namespaces for bd.wikimedia.org (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/463582 (https://phabricator.wikimedia.org/T205747) (owner: 10Zoranzoki21) [16:32:52] PROBLEM - Device not healthy -SMART- on db1067 is CRITICAL: cluster=mysql device=megaraid,7 instance=db1067:9100 job=node site=eqiad https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=db1067&var-datasource=eqiad%2520prometheus%252Fops [16:37:03] 10Operations, 10ops-eqiad, 10DBA: db1067 (enwiki master) disk #7 with errors - https://phabricator.wikimedia.org/T205780 (10Marostegui) [16:37:35] 10Operations, 10ops-eqiad, 10DBA: db1067 (enwiki master) disk #7 with errors - https://phabricator.wikimedia.org/T205780 (10Marostegui) p:05Triage>03High [16:37:54] ACKNOWLEDGEMENT - Device not healthy -SMART- on db1067 is CRITICAL: cluster=mysql device=megaraid,7 instance=db1067:9100 job=node site=eqiad Marostegui T205780 - The acknowledgement expires at: 2018-10-03 16:37:14. https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=db1067&var-datasource=eqiad%2520prometheus%252Fops [17:15:10] (03CR) 10MSantos: "Thanks @BearND!" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/463542 (https://phabricator.wikimedia.org/T202201) (owner: 10MSantos) [17:25:15] (03CR) 10Urbanecm: [C: 031] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/463582 (https://phabricator.wikimedia.org/T205747) (owner: 10Zoranzoki21) [18:58:53] RECOVERY - Ensure legal html en.wp on en.wikipedia.org is OK: all html is present. [19:07:05] 08Warning Alert for device cr4-ulsfo.wikimedia.org - Inbound interface errors [19:18:06] 08̶W̶a̶r̶n̶i̶n̶g Device cr4-ulsfo.wikimedia.org recovered from Inbound interface errors [19:24:32] PROBLEM - MediaWiki memcached error rate on graphite1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [5000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [19:26:43] RECOVERY - MediaWiki memcached error rate on graphite1001 is OK: OK: Less than 40.00% above the threshold [1000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [19:47:42] d/win go xcombelle [19:53:27] (03PS2) 10Urbanecm: Change acewiki default time zone to Asia/Jakarta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/463584 (https://phabricator.wikimedia.org/T205693) (owner: 10Zoranzoki21) [19:55:19] (03PS3) 10Urbanecm: Change acewiki default time zone to Asia/Jakarta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/463584 (https://phabricator.wikimedia.org/T205693) (owner: 10Zoranzoki21) [19:56:55] (03CR) 10Urbanecm: [C: 031] "LGTM, previous patch was just a rebase to remove invalid dependency." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/463584 (https://phabricator.wikimedia.org/T205693) (owner: 10Zoranzoki21) [19:57:03] PROBLEM - MediaWiki memcached error rate on graphite1001 is CRITICAL: CRITICAL: 80.00% of data above the critical threshold [5000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [19:59:22] RECOVERY - MediaWiki memcached error rate on graphite1001 is OK: OK: Less than 40.00% above the threshold [1000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [20:18:33] PROBLEM - MediaWiki memcached error rate on graphite1001 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [5000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [20:20:43] RECOVERY - MediaWiki memcached error rate on graphite1001 is OK: OK: Less than 40.00% above the threshold [1000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [20:28:13] PROBLEM - Filesystem available is greater than filesystem size on ms-be2043 is CRITICAL: cluster=swift device=/dev/sde1 fstype=xfs instance=ms-be2043:9100 job=node mountpoint=/srv/swift-storage/sde1 site=codfw https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=ms-be2043&var-datasource=codfw%2520prometheus%252Fops [21:06:22] PROBLEM - MediaWiki memcached error rate on graphite1001 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [5000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [21:17:12] RECOVERY - MediaWiki memcached error rate on graphite1001 is OK: OK: Less than 40.00% above the threshold [1000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [21:51:43] PROBLEM - MediaWiki memcached error rate on graphite1001 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [5000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [21:54:02] RECOVERY - MediaWiki memcached error rate on graphite1001 is OK: OK: Less than 40.00% above the threshold [1000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [22:07:02] PROBLEM - MediaWiki memcached error rate on graphite1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [5000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [22:09:13] RECOVERY - MediaWiki memcached error rate on graphite1001 is OK: OK: Less than 40.00% above the threshold [1000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [22:33:33] PROBLEM - Filesystem available is greater than filesystem size on ms-be2042 is CRITICAL: cluster=swift device=/dev/sdk1 fstype=xfs instance=ms-be2042:9100 job=node mountpoint=/srv/swift-storage/sdk1 site=codfw https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=ms-be2042&var-datasource=codfw%2520prometheus%252Fops