[00:04:52] PROBLEM - Check systemd state on restbase-dev1004 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [00:05:52] RECOVERY - Check systemd state on restbase-dev1004 is OK: OK - running: The system is fully operational [00:08:39] !log restarting jenkins to finalize updates [00:08:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:11:23] (03Abandoned) 10Thcipriani: l10nupdate: turn back on after holidays [puppet] - 10https://gerrit.wikimedia.org/r/399840 (owner: 10Thcipriani) [00:13:19] (03PS8) 10Thcipriani: Scap: scap_source correct gid [puppet] - 10https://gerrit.wikimedia.org/r/361796 [00:39:05] (03PS4) 10Krinkle: Support PHPUnit 6.5 in composer.json [mediawiki-config] - 10https://gerrit.wikimedia.org/r/435626 (owner: 10Reedy) [00:39:45] (03CR) 10Krinkle: [C: 032] Support PHPUnit 6.5 in composer.json [mediawiki-config] - 10https://gerrit.wikimedia.org/r/435626 (owner: 10Reedy) [00:40:58] (03Merged) 10jenkins-bot: Support PHPUnit 6.5 in composer.json [mediawiki-config] - 10https://gerrit.wikimedia.org/r/435626 (owner: 10Reedy) [00:41:29] (03CR) 10jenkins-bot: Support PHPUnit 6.5 in composer.json [mediawiki-config] - 10https://gerrit.wikimedia.org/r/435626 (owner: 10Reedy) [00:57:31] (03PS1) 10Kaldari: Removing wmgUseCongressLookup from InitialiseSettings-labs.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437377 [01:06:11] (03PS1) 10Kaldari: Testing page creation log on Beta Labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437379 (https://phabricator.wikimedia.org/T196400) [01:06:33] (03CR) 10Kaldari: [C: 032] Removing wmgUseCongressLookup from InitialiseSettings-labs.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437377 (owner: 10Kaldari) [01:07:47] (03Merged) 10jenkins-bot: Removing wmgUseCongressLookup from InitialiseSettings-labs.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437377 (owner: 10Kaldari) [01:08:43] (03CR) 10jenkins-bot: Removing wmgUseCongressLookup from InitialiseSettings-labs.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437377 (owner: 10Kaldari) [01:17:25] (03CR) 10Dzahn: "i did not merge that yet because i had a little worry whether the additional service IPs and LVS puppet part might influence prod phab, tr" [puppet] - 10https://gerrit.wikimedia.org/r/437300 (https://phabricator.wikimedia.org/T196019) (owner: 10Dzahn) [01:59:46] (03PS1) 10Dzahn: remove mariadb includes from mw-maintenance role [puppet] - 10https://gerrit.wikimedia.org/r/437382 [02:02:45] PROBLEM - LVS HTTP IPv4 on thumbor.svc.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 20 seconds [02:03:35] RECOVERY - LVS HTTP IPv4 on thumbor.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 282 bytes in 0.003 second response time [02:03:42] hmm... [02:20:43] !log l10nupdate@deploy1001 scap sync-l10n completed (1.32.0-wmf.6) (duration: 08m 21s) [02:20:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:30:57] !log l10nupdate@deploy1001 ResourceLoader cache refresh completed at Tue Jun 5 02:30:57 UTC 2018 (duration 10m 15s) [02:31:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:41:35] !log pnorman@deploy1001 Started deploy [tilerator/deploy@074d01a] (cleartables): Disable style without labels [02:41:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:43:57] !log pnorman@deploy1001 Finished deploy [tilerator/deploy@074d01a] (cleartables): Disable style without labels (duration: 02m 22s) [02:44:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:45:17] !log pnorman@deploy1001 Started deploy [tilerator/deploy@074d01a] (cleartables): Disable more sources [02:45:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:45:43] !log pnorman@deploy1001 Finished deploy [tilerator/deploy@074d01a] (cleartables): Disable more sources (duration: 00m 26s) [02:45:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:49:26] !log pnorman@deploy1001 Started deploy [tilerator/deploy@074d01a] (cleartables): enable v3view [02:49:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:51:28] !log pnorman@deploy1001 Started deploy [tilerator/deploy@074d01a] (cleartables): enable v3view [02:51:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:51:34] !log pnorman@deploy1001 Finished deploy [tilerator/deploy@074d01a] (cleartables): enable v3view (duration: 00m 06s) [02:51:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:07:24] !log pnorman@deploy1001 Started deploy [tilerator/deploy@9e40702] (cleartables): Re-try 074d01a [03:07:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:12:14] !log pnorman@deploy1001 Finished deploy [tilerator/deploy@9e40702] (cleartables): Re-try 074d01a (duration: 04m 50s) [03:12:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:19:00] !log pnorman@deploy1001 Started deploy [tilerator/deploy@9e40702] (cleartables): Re-try 074d01a [03:19:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:25:02] !log pnorman@deploy1001 Finished deploy [tilerator/deploy@9e40702] (cleartables): Re-try 074d01a (duration: 06m 02s) [03:25:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:25:52] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 878.69 seconds [04:16:32] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 171.16 seconds [05:00:02] PROBLEM - restbase endpoints health on restbase-dev1006 is CRITICAL: /en.wikipedia.org/v1/feed/onthisday/{type}/{mm}/{dd} (Retrieve selected the events for Jan 01) timed out before a response was received: /en.wikipedia.org/v1/transform/wikitext/to/html{/title}{/revision} (Transform wikitext to html) is WARNING: Test Transform wikitext to html responds with unexpected body: h2 id=HeadingHeading/h2 != /^h2.* Heading \/h2/: /en.w [05:00:02] e/media/{title}{/revision} (Get media in test page) is WARNING: Test Get media in test page responds with unexpected value at path /items[2] = Missing keys: [utitles, uthumbnail, ulicense] [05:10:43] (03PS1) 10Marostegui: db-eqiad.php: Depool db1100 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437383 (https://phabricator.wikimedia.org/T191316) [05:12:24] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1100 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437383 (https://phabricator.wikimedia.org/T191316) (owner: 10Marostegui) [05:13:34] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1100 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437383 (https://phabricator.wikimedia.org/T191316) (owner: 10Marostegui) [05:13:51] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1100 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437383 (https://phabricator.wikimedia.org/T191316) (owner: 10Marostegui) [05:14:44] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1100 for alter table (duration: 00m 51s) [05:14:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:14:49] !log Deploy schema change on db1100 - T191316 T192926 T89737 T195193 [05:14:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:14:55] T89737: Make several mediawiki table fields unsigned ints on wmf databases - https://phabricator.wikimedia.org/T89737 [05:14:55] T192926: Schema change to drop archive.ar_text and archive.ar_flags - https://phabricator.wikimedia.org/T192926 [05:14:56] T195193: Schema change for ct_tag_id field to change_tag - https://phabricator.wikimedia.org/T195193 [05:14:56] T191316: Schema change to make archive.ar_rev_id NOT NULL - https://phabricator.wikimedia.org/T191316 [05:35:16] (03PS1) 10Marostegui: db-eqiad.php: Depool db2059 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437384 [05:38:19] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db2059 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437384 (owner: 10Marostegui) [05:39:27] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db2059 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437384 (owner: 10Marostegui) [05:39:43] (03CR) 10jenkins-bot: db-eqiad.php: Depool db2059 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437384 (owner: 10Marostegui) [05:44:32] !log marostegui@deploy1001 Synchronized wmf-config/db-codfw.php: Depool db2059 for reimage (duration: 00m 51s) [05:44:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:44:50] 10Operations, 10DBA, 10Goal, 10Patch-For-Review: Convert all sanitarium hosts to multi-instance and increase its reliability/redundancy - https://phabricator.wikimedia.org/T190704#4255833 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on neodymium.eqiad.wmnet for hosts: ``` ['db2... [05:45:50] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1100" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437385 [05:47:02] PROBLEM - Host db2059 is DOWN: PING CRITICAL - Packet loss = 100% [05:47:59] ^ that is the reimage [05:49:07] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1100" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437385 (owner: 10Marostegui) [05:50:16] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1100" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437385 (owner: 10Marostegui) [05:50:52] RECOVERY - Host db2059 is UP: PING OK - Packet loss = 0%, RTA = 36.12 ms [05:51:25] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1100" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437385 (owner: 10Marostegui) [05:52:34] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1100 after alter table (duration: 00m 50s) [05:52:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:53:13] (03PS1) 10Marostegui: install_server: Allow reimage db2059 [puppet] - 10https://gerrit.wikimedia.org/r/437386 [05:54:30] (03CR) 10Marostegui: [C: 032] install_server: Allow reimage db2059 [puppet] - 10https://gerrit.wikimedia.org/r/437386 (owner: 10Marostegui) [05:57:13] 10Operations, 10DBA, 10Goal, 10Patch-For-Review: Convert all sanitarium hosts to multi-instance and increase its reliability/redundancy - https://phabricator.wikimedia.org/T190704#4255850 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on neodymium.eqiad.wmnet for hosts: ``` ['db2... [05:58:23] RECOVERY - Device not healthy -SMART- on db2059 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=db2059&var-datasource=codfw%2520prometheus%252Fops [06:19:49] (03PS1) 10Giuseppe Lavagetto: tlsproxy::localssl: allow tuning of timeouts [puppet] - 10https://gerrit.wikimedia.org/r/437388 [06:27:00] 10Operations, 10DBA, 10Goal, 10Patch-For-Review: Convert all sanitarium hosts to multi-instance and increase its reliability/redundancy - https://phabricator.wikimedia.org/T190704#4255878 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['db2059.codfw.wmnet'] ``` and were **ALL** successful. [06:29:26] PROBLEM - puppet last run on mw1289 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/bin/prometheus-puppet-agent-stats] [06:30:06] PROBLEM - puppet last run on db1092 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/root/.screenrc] [06:31:18] (03PS1) 10Marostegui: Revert "install_server: Allow reimage db2059" [puppet] - 10https://gerrit.wikimedia.org/r/437390 [06:32:04] (03CR) 10Marostegui: [C: 032] Revert "install_server: Allow reimage db2059" [puppet] - 10https://gerrit.wikimedia.org/r/437390 (owner: 10Marostegui) [06:34:25] (03PS1) 10Marostegui: db-codfw.php: Repool db2059 after reimage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437391 [06:37:33] 10Operations, 10Analytics-Cluster, 10Analytics-Kanban, 10Traffic, and 2 others: TLS security review of the Kafka stack - https://phabricator.wikimedia.org/T182993#4255890 (10elukey) Reporting a IRC discussion in here. It would be great to make a list of next steps for: * remove IPSEC completely between Jum... [06:38:57] (03CR) 10Marostegui: [C: 032] db-codfw.php: Repool db2059 after reimage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437391 (owner: 10Marostegui) [06:40:08] (03Merged) 10jenkins-bot: db-codfw.php: Repool db2059 after reimage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437391 (owner: 10Marostegui) [06:42:38] !log marostegui@deploy1001 Synchronized wmf-config/db-codfw.php: Repool db2059 after reimage (duration: 00m 50s) [06:42:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:45:37] (03PS2) 10Giuseppe Lavagetto: tlsproxy::localssl: allow tuning of timeouts [puppet] - 10https://gerrit.wikimedia.org/r/437388 [06:45:53] (03CR) 10jenkins-bot: db-codfw.php: Repool db2059 after reimage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437391 (owner: 10Marostegui) [06:48:04] (03CR) 10Muehlenhoff: "Yeah, I tested this on terbium for the latest offboarding." [puppet] - 10https://gerrit.wikimedia.org/r/436812 (owner: 10Muehlenhoff) [06:48:08] (03PS7) 10Muehlenhoff: Implement paged LDAP searches in offboarding script [puppet] - 10https://gerrit.wikimedia.org/r/436812 [06:48:16] PROBLEM - Router interfaces on cr1-eqdfw is CRITICAL: CRITICAL: host 208.80.153.198, interfaces up: 46, down: 1, dormant: 0, excluded: 0, unused: 0 [06:49:15] (03PS1) 10Urbanecm: Initial configuration for bnwikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437393 (https://phabricator.wikimedia.org/T196357) [06:49:20] (03CR) 10Muehlenhoff: [C: 032] Implement paged LDAP searches in offboarding script [puppet] - 10https://gerrit.wikimedia.org/r/436812 (owner: 10Muehlenhoff) [06:55:26] RECOVERY - puppet last run on db1092 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [06:59:45] RECOVERY - puppet last run on mw1289 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [07:05:44] (03PS3) 10Giuseppe Lavagetto: tlsproxy::localssl: allow tuning of timeouts [puppet] - 10https://gerrit.wikimedia.org/r/437388 [07:05:46] (03PS1) 10Giuseppe Lavagetto: videoscaler: up the TLS termination read_timeout to 1 day [puppet] - 10https://gerrit.wikimedia.org/r/437394 [07:05:57] (03CR) 10Giuseppe Lavagetto: "https://puppet-compiler.wmflabs.org/compiler02/11374/" [puppet] - 10https://gerrit.wikimedia.org/r/437388 (owner: 10Giuseppe Lavagetto) [07:11:37] (03PS3) 10Ema: Initial debianization [software/varnish/libvmod-re2] (debian) - 10https://gerrit.wikimedia.org/r/437268 (https://phabricator.wikimedia.org/T196355) [07:18:51] (03CR) 10Ema: [C: 031] "Two nits, LGTM otherwise." (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/437388 (owner: 10Giuseppe Lavagetto) [07:22:02] <_joe_> ema: regarding adding "s" to the keepalive timeout, I tried to make the change a noop [07:22:12] <_joe_> as in, not change a byte on disk in production [07:22:14] (03Abandoned) 10Samwilson: Revert "Deploy GlobalPreferences to test wikis and mw.org (fourth time)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/433291 (owner: 10Samwilson) [07:24:08] (03CR) 10Giuseppe Lavagetto: tlsproxy::localssl: allow tuning of timeouts (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/437388 (owner: 10Giuseppe Lavagetto) [07:24:53] (03PS4) 10Giuseppe Lavagetto: tlsproxy::localssl: allow tuning of timeouts [puppet] - 10https://gerrit.wikimedia.org/r/437388 [07:24:55] (03PS2) 10Giuseppe Lavagetto: videoscaler: up the TLS termination read_timeout to 1 day [puppet] - 10https://gerrit.wikimedia.org/r/437394 [07:25:28] (03CR) 10Giuseppe Lavagetto: [C: 032] tlsproxy::localssl: allow tuning of timeouts [puppet] - 10https://gerrit.wikimedia.org/r/437388 (owner: 10Giuseppe Lavagetto) [07:26:28] (03CR) 10Ema: tlsproxy::localssl: allow tuning of timeouts (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/437388 (owner: 10Giuseppe Lavagetto) [07:26:38] _joe_: k! [07:27:05] <_joe_> more in general, I was thinking we should have a more holistic approach at timeouts [07:27:16] (03CR) 10Muehlenhoff: "Some comments inline. As mentioned in T174465 I'm not fond of the general concept, but with our current setup I can't offer anything clean" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/379004 (https://phabricator.wikimedia.org/T174465) (owner: 10Ottomata) [07:28:15] you said holistic! [07:29:39] (03PS4) 10Ema: Initial debianization [software/varnish/libvmod-re2] (debian) - 10https://gerrit.wikimedia.org/r/437268 (https://phabricator.wikimedia.org/T196355) [07:30:49] <_joe_> yeah :/ [07:31:02] <_joe_> SV is catching up to me [07:31:32] (03CR) 10Giuseppe Lavagetto: [C: 032] videoscaler: up the TLS termination read_timeout to 1 day [puppet] - 10https://gerrit.wikimedia.org/r/437394 (owner: 10Giuseppe Lavagetto) [07:32:00] _joe_: jokes aside, what do you mean exactly? [07:32:29] <_joe_> that we decide one timeout for - say - cache::text and fetch it via hiera with a general label [07:32:33] <_joe_> and use the same across the stack [07:32:43] <_joe_> in nginx, varnish, apache2, hhvm [07:32:54] <_joe_> and do the same with restbase and the services [07:34:03] (03CR) 10Ema: [C: 032] Initial debianization [software/varnish/libvmod-re2] (debian) - 10https://gerrit.wikimedia.org/r/437268 (https://phabricator.wikimedia.org/T196355) (owner: 10Ema) [07:38:07] _joe_: that sounds good, it just might be hard to do in practice. Varnish for instance has an obscene amount of timeouts, most of which might well not have a $other_server equivalent [07:38:25] PROBLEM - kubelet operational latencies on kubernetes1003 is CRITICAL: instance=kubernetes1003.eqiad.wmnet operation_type={create_container,image_status,podsandbox_status,remove_container,start_container} https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1 [07:39:35] RECOVERY - kubelet operational latencies on kubernetes1003 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1 [07:57:23] (03PS2) 10Gehel: Increase number of osm2pgsql processes to 8 [puppet] - 10https://gerrit.wikimedia.org/r/437301 (owner: 10Pnorman) [07:57:31] (03PS3) 10Gehel: Increase number of osm2pgsql processes to 8 [puppet] - 10https://gerrit.wikimedia.org/r/437301 (owner: 10Pnorman) [07:57:59] (03CR) 10jerkins-bot: [V: 04-1] Increase number of osm2pgsql processes to 8 [puppet] - 10https://gerrit.wikimedia.org/r/437301 (owner: 10Pnorman) [07:58:57] (03PS4) 10Gehel: Increase number of osm2pgsql processes to 8 [puppet] - 10https://gerrit.wikimedia.org/r/437301 (owner: 10Pnorman) [08:02:17] (03CR) 10Gehel: [C: 032] Increase number of osm2pgsql processes to 8 [puppet] - 10https://gerrit.wikimedia.org/r/437301 (owner: 10Pnorman) [08:05:05] PROBLEM - restbase endpoints health on restbase-dev1005 is CRITICAL: /en.wikipedia.org/v1/feed/onthisday/{type}/{mm}/{dd} (Retrieve selected the events for Jan 01) timed out before a response was received: /en.wikipedia.org/v1/transform/wikitext/to/html{/title}{/revision} (Transform wikitext to html) is WARNING: Test Transform wikitext to html responds with unexpected body: h2 id=HeadingHeading/h2 != /^h2.* Heading \/h2/: /en.w [08:05:05] e/media/{title}{/revision} (Get media in test page) is WARNING: Test Get media in test page responds with unexpected value at path /items[2] = Missing keys: [utitles, uthumbnail, ulicense] [08:10:36] !log rebooting elastic10(41|43) for plugin update - T193734 [08:10:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:10:41] T193734: Move Serbian language wikis from extra-analysis to extra-analysis-serbian plugin - https://phabricator.wikimedia.org/T193734 [08:16:13] !log libvmod-re2 1.3.1-1 uploaded to apt.w.o T196355 [08:16:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:16:18] T196355: Package libvmod-re2 - https://phabricator.wikimedia.org/T196355 [08:18:25] 10Operations, 10Traffic, 10netops: Configure interface damping on primary links - https://phabricator.wikimedia.org/T196432#4256062 (10ayounsi) p:05Triage>03Normal [08:18:33] (03PS1) 10Giuseppe Lavagetto: jobrunner/videoscaler: factor out "base" roles to use in beta [puppet] - 10https://gerrit.wikimedia.org/r/437406 [08:20:04] <_joe_> volans: you have to fix wmf_auto_reimage not to fetch the baseurls on tin :P [08:20:26] _joe_: I think it was fixed already by mut.ante [08:20:46] 10Operations, 10Traffic, 10netops: Configure interface damping on primary links - https://phabricator.wikimedia.org/T196432#4256087 (10ayounsi) [08:20:49] at least I saw a CR passing by, didn't follow it closely though [08:20:52] I canv erify [08:21:56] <_joe_> ahahahah [08:22:04] <_joe_> I can't believe you people :P [08:23:33] _joe_: actually he fixed only the docstring, because the code was already future-proof ;) [08:23:36] ACKNOWLEDGEMENT - Router interfaces on cr1-eqdfw is CRITICAL: CRITICAL: host 208.80.153.198, interfaces up: 46, down: 1, dormant: 0, excluded: 0, unused: 0: Ayounsi Still waiting for GTT [08:23:36] https://gerrit.wikimedia.org/r/#/c/436831/2/modules/profile/files/cumin/wmf_auto_reimage_lib.py [08:24:12] <_joe_> well, how is that host variable fetched? [08:25:06] dns, deployment.eqiad.wmnet CNAME [08:26:43] btw we should probably add those CNAMEs to the discovery.wmnet section, so to have a local one per DC and a 'global' master one [08:29:52] (03CR) 10Mobrovac: [C: 031] Specify videoscalers uri in hiera/changeprop manifest. [puppet] - 10https://gerrit.wikimedia.org/r/437281 (https://phabricator.wikimedia.org/T190327) (owner: 10Ppchelko) [08:30:36] PROBLEM - restbase endpoints health on restbase-dev1006 is CRITICAL: /en.wikipedia.org/v1/feed/onthisday/{type}/{mm}/{dd} (Retrieve selected the events for Jan 01) timed out before a response was received: /en.wikipedia.org/v1/transform/wikitext/to/html{/title}{/revision} (Transform wikitext to html) is WARNING: Test Transform wikitext to html responds with unexpected body: h2 id=HeadingHeading/h2 != /^h2.* Heading \/h2/: /en.w [08:30:36] e/media/{title}{/revision} (Get media in test page) is WARNING: Test Get media in test page responds with unexpected value at path /items[2] = Missing keys: [utitles, uthumbnail, ulicense] [08:32:18] hm [08:33:11] (03PS2) 10Mobrovac: Disable redis queue for videoscaler jobs. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437286 (https://phabricator.wikimedia.org/T190327) (owner: 10Ppchelko) [08:33:49] _joe_: mind reviewing https://gerrit.wikimedia.org/r/#/c/437281/ please? [08:37:22] (03CR) 10Giuseppe Lavagetto: [C: 032] Specify videoscalers uri in hiera/changeprop manifest. [puppet] - 10https://gerrit.wikimedia.org/r/437281 (https://phabricator.wikimedia.org/T190327) (owner: 10Ppchelko) [08:37:28] (03PS3) 10Giuseppe Lavagetto: Specify videoscalers uri in hiera/changeprop manifest. [puppet] - 10https://gerrit.wikimedia.org/r/437281 (https://phabricator.wikimedia.org/T190327) (owner: 10Ppchelko) [08:40:11] <_joe_> running puppet on scb1001 [08:40:54] <_joe_> Pchelolo: want me to force a puppet run on all of scb? [08:41:26] _joe_: mobrovac is doing that [08:41:51] <_joe_> heh [08:43:26] 10Operations, 10ops-eqiad, 10DBA: Degraded RAID on labsdb1009 - https://phabricator.wikimedia.org/T195690#4256161 (10MoritzMuehlenhoff) >>! In T195690#4254655, @Marostegui wrote: > I just saw it is in the repo but it wasn't installed > We should add it by default on the puppet recipe for HP hosts probably Y... [08:45:16] PROBLEM - puppet last run on scb1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [08:45:55] PROBLEM - puppet last run on scb1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [08:48:31] (03CR) 10Mobrovac: [C: 032] Disable redis queue for videoscaler jobs. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437286 (https://phabricator.wikimedia.org/T190327) (owner: 10Ppchelko) [08:48:45] PROBLEM - puppet last run on scb1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [08:49:00] I guess you're on top of the scb puppet failures :) [08:49:44] (03Merged) 10jenkins-bot: Disable redis queue for videoscaler jobs. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437286 (https://phabricator.wikimedia.org/T190327) (owner: 10Ppchelko) [08:50:00] (03CR) 10jenkins-bot: Disable redis queue for videoscaler jobs. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437286 (https://phabricator.wikimedia.org/T190327) (owner: 10Ppchelko) [08:51:13] !log ppchelko@deploy1001 Started deploy [cpjobqueue/deploy@63b30a6]: Enable videoscaler jobs in kafka T190327 [08:51:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:51:18] T190327: FY17/18 Q4 Program 8 Services Goal: Complete the JobQueue transition to EventBus - https://phabricator.wikimedia.org/T190327 [08:52:02] !log ppchelko@deploy1001 Finished deploy [cpjobqueue/deploy@63b30a6]: Enable videoscaler jobs in kafka T190327 (duration: 00m 49s) [08:52:04] !log mobrovac@deploy1001 Synchronized wmf-config/jobqueue.php: Switch video scaling jobs to EventBus - T190327 (duration: 00m 52s) [08:52:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:52:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:53:55] RECOVERY - puppet last run on scb1004 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [08:54:32] (03CR) 10Alexandros Kosiaris: "Started a cross-fleet PCC run" [puppet] - 10https://gerrit.wikimedia.org/r/437241 (owner: 10Alexandros Kosiaris) [08:55:28] (03PS1) 10Marostegui: db-eqiad.php: Depool db1110 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437414 (https://phabricator.wikimedia.org/T191316) [08:56:05] RECOVERY - puppet last run on scb1003 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [08:57:34] (03PS1) 10Jcrespo: mariadb: Depool db1080 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437415 [08:57:59] (03PS1) 10Muehlenhoff: Install HPSSA diagnostics tool on servers with HP RAID [puppet] - 10https://gerrit.wikimedia.org/r/437417 [08:58:45] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1110 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437414 (https://phabricator.wikimedia.org/T191316) (owner: 10Marostegui) [08:59:54] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1110 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437414 (https://phabricator.wikimedia.org/T191316) (owner: 10Marostegui) [09:00:09] (03PS2) 10Jcrespo: mariadb: Depool db1080 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437415 [09:00:46] (03CR) 10Marostegui: [C: 031] Install HPSSA diagnostics tool on servers with HP RAID [puppet] - 10https://gerrit.wikimedia.org/r/437417 (owner: 10Muehlenhoff) [09:01:09] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1110 for alter table (duration: 00m 51s) [09:01:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:01:20] !log Deploy schema change on db1110 - T191316 T192926 T89737 T195193 [09:01:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:01:26] T89737: Make several mediawiki table fields unsigned ints on wmf databases - https://phabricator.wikimedia.org/T89737 [09:01:26] T192926: Schema change to drop archive.ar_text and archive.ar_flags - https://phabricator.wikimedia.org/T192926 [09:01:27] T195193: Schema change for ct_tag_id field to change_tag - https://phabricator.wikimedia.org/T195193 [09:01:27] T191316: Schema change to make archive.ar_rev_id NOT NULL - https://phabricator.wikimedia.org/T191316 [09:01:30] (03CR) 10Jcrespo: [C: 032] mariadb: Depool db1080 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437415 (owner: 10Jcrespo) [09:02:06] (03PS11) 10Jcrespo: mariadb: Add extra_port on port + 20 for multiinstance hosts [puppet] - 10https://gerrit.wikimedia.org/r/435751 [09:02:46] (03Merged) 10jenkins-bot: mariadb: Depool db1080 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437415 (owner: 10Jcrespo) [09:02:49] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1110 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437414 (https://phabricator.wikimedia.org/T191316) (owner: 10Marostegui) [09:05:37] (03CR) 10Volans: [C: 04-1] "I think we should have a slightly different approach. See my comments inline." (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/437052 (owner: 10Alex Monk) [09:05:45] RECOVERY - puppet last run on scb1002 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [09:07:16] PROBLEM - Restbase LVS codfw on restbase.svc.codfw.wmnet is CRITICAL: /en.wikipedia.org/v1/transform/wikitext/to/html{/title}{/revision} (Transform wikitext to html) is WARNING: Test Transform wikitext to html responds with unexpected body: h2 id=HeadingHeading/h2 != /^h2.* Heading \/h2/: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) is CRITICAL: Test Retrieve aggregat [09:07:16] April 29, 2016 returned the unexpected status 504 (expecting: 200): /en.wikipedia.org/v1/page/media/{title}{/revision} (Get media in test page) is WARNING: Test Get media in test page responds with unexpected value at path /items[2] = Missing keys: [utitles, uthumbnail, ulicense] [09:07:47] (03CR) 10Jcrespo: [C: 031] Install HPSSA diagnostics tool on servers with HP RAID [puppet] - 10https://gerrit.wikimedia.org/r/437417 (owner: 10Muehlenhoff) [09:08:00] (03PS1) 10Marostegui: mariadb: Set db1120 to spare [puppet] - 10https://gerrit.wikimedia.org/r/437418 (https://phabricator.wikimedia.org/T196376) [09:08:59] (03CR) 10jenkins-bot: mariadb: Depool db1080 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437415 (owner: 10Jcrespo) [09:09:40] !log jynus@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1080 (duration: 00m 49s) [09:09:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:11:27] (03PS10) 10Elukey: Create profile::analytics::cluster::packages class [puppet] - 10https://gerrit.wikimedia.org/r/436012 (owner: 10Ottomata) [09:13:04] (03PS11) 10Elukey: Create profile::analytics::cluster::packages class [puppet] - 10https://gerrit.wikimedia.org/r/436012 (owner: 10Ottomata) [09:14:05] (03CR) 10Marostegui: "https://puppet-compiler.wmflabs.org/compiler03/11377/" [puppet] - 10https://gerrit.wikimedia.org/r/437418 (https://phabricator.wikimedia.org/T196376) (owner: 10Marostegui) [09:17:04] (03CR) 10Marostegui: [C: 032] mariadb: Set db1120 to spare [puppet] - 10https://gerrit.wikimedia.org/r/437418 (https://phabricator.wikimedia.org/T196376) (owner: 10Marostegui) [09:21:01] (03PS1) 10Marostegui: sX.hosts: Remove db1120 [software] - 10https://gerrit.wikimedia.org/r/437423 (https://phabricator.wikimedia.org/T196376) [09:22:13] (03PS2) 10Alexandros Kosiaris: ganeti: Absent DSA SSH key [puppet] - 10https://gerrit.wikimedia.org/r/437219 (https://phabricator.wikimedia.org/T177371) [09:22:16] !log stop db1080 for reimage [09:22:18] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] ganeti: Absent DSA SSH key [puppet] - 10https://gerrit.wikimedia.org/r/437219 (https://phabricator.wikimedia.org/T177371) (owner: 10Alexandros Kosiaris) [09:22:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:23:08] 10Operations, 10Analytics, 10ChangeProp, 10EventBus, and 4 others: Select candidate jobs for transferring to the new infrastucture - https://phabricator.wikimedia.org/T175210#4256392 (10Pchelolo) [09:25:30] 10Operations: Broken pinning on some WMCS servers - https://phabricator.wikimedia.org/T195835#4256414 (10aborrero) Also cleaned `labpuppetmaster1002.wikimedia.org` per @Volans report [09:26:46] (03CR) 10Volans: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/437417 (owner: 10Muehlenhoff) [09:26:48] (03PS12) 10Jcrespo: mariadb: Add extra_port on port + 20 for multiinstance hosts [puppet] - 10https://gerrit.wikimedia.org/r/435751 [09:26:50] (03PS1) 10Jcrespo: mariadb: Reimage db1080 with stretch instead of jessie [puppet] - 10https://gerrit.wikimedia.org/r/437424 [09:30:03] (03PS2) 10Jcrespo: mariadb: Reimage db1080 with stretch instead of jessie [puppet] - 10https://gerrit.wikimedia.org/r/437424 [09:30:49] (03CR) 10Marostegui: [C: 032] sX.hosts: Remove db1120 [software] - 10https://gerrit.wikimedia.org/r/437423 (https://phabricator.wikimedia.org/T196376) (owner: 10Marostegui) [09:31:37] (03Merged) 10jenkins-bot: sX.hosts: Remove db1120 [software] - 10https://gerrit.wikimedia.org/r/437423 (https://phabricator.wikimedia.org/T196376) (owner: 10Marostegui) [09:31:47] (03CR) 10Jcrespo: [C: 032] mariadb: Reimage db1080 with stretch instead of jessie [puppet] - 10https://gerrit.wikimedia.org/r/437424 (owner: 10Jcrespo) [09:32:33] (03PS5) 10Ema: prometheus: export intel-microcode information via node_exporter [puppet] - 10https://gerrit.wikimedia.org/r/436553 (https://phabricator.wikimedia.org/T127825) [09:32:57] (03CR) 10Elukey: "Turns out that what I'd like to do requires a bit more refactoring, so before proceeding any further I am going to explain my idea for the" [puppet] - 10https://gerrit.wikimedia.org/r/436012 (owner: 10Ottomata) [09:33:32] (03CR) 10jerkins-bot: [V: 04-1] prometheus: export intel-microcode information via node_exporter [puppet] - 10https://gerrit.wikimedia.org/r/436553 (https://phabricator.wikimedia.org/T127825) (owner: 10Ema) [09:34:48] PROBLEM - Restbase LVS codfw on restbase.svc.codfw.wmnet is CRITICAL: /en.wikipedia.org/v1/page/metadata/{title}{/revision} (Get extended metadata of a test page) is CRITICAL: Test Get extended metadata of a test page returned the unexpected status 500 (expecting: 200): /en.wikipedia.org/v1/transform/wikitext/to/html{/title}{/revision} (Transform wikitext to html) is WARNING: Test Transform wikitext to html responds with unexpe [09:34:48] adingHeading/h2 != /^h2.* Heading \/h2/: /en.wikipedia.org/v1/page/media/{title}{/revision} (Get media in test page) is WARNING: Test Get media in test page responds with unexpected value at path /items[2] = Missing keys: [utitles, uthumbnail, ulicense] [09:35:30] 10Operations, 10DBA, 10Goal, 10Patch-For-Review: Convert all sanitarium hosts to multi-instance and increase its reliability/redundancy - https://phabricator.wikimedia.org/T190704#4256433 (10Marostegui) [09:41:19] sigh, the page must have been changed (re rb alert) [09:49:23] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1110" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437427 [09:49:27] (03PS2) 10Marostegui: Revert "db-eqiad.php: Depool db1110" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437427 [09:49:30] (03PS1) 10Jcrespo: mariadb: Repool db1080 after reimage with low load [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437428 [09:49:34] (03PS6) 10Ema: prometheus: export intel-microcode information via node_exporter [puppet] - 10https://gerrit.wikimedia.org/r/436553 (https://phabricator.wikimedia.org/T127825) [09:52:11] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1110" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437427 (owner: 10Marostegui) [09:53:43] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1110" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437427 (owner: 10Marostegui) [09:54:44] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1110 after alter table (duration: 00m 50s) [09:54:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:54:57] PROBLEM - Disk space on elastic1017 is CRITICAL: DISK CRITICAL - free space: /srv 60893 MB (12% inode=99%) [09:58:27] (03PS1) 10Muehlenhoff: Enable microcode updates for Parsoid hosts [puppet] - 10https://gerrit.wikimedia.org/r/437430 (https://phabricator.wikimedia.org/T127825) [09:58:47] (03CR) 10Marostegui: [C: 031] mariadb: Add extra_port on port + 20 for multiinstance hosts [puppet] - 10https://gerrit.wikimedia.org/r/435751 (owner: 10Jcrespo) [10:01:10] (03CR) 10Jcrespo: "https://puppet-compiler.wmflabs.org/compiler03/11376/" [puppet] - 10https://gerrit.wikimedia.org/r/435751 (owner: 10Jcrespo) [10:01:22] (03PS2) 10Alexandros Kosiaris: ganeti: Remove DSA ssh key [puppet] - 10https://gerrit.wikimedia.org/r/437220 (https://phabricator.wikimedia.org/T177371) [10:01:26] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] ganeti: Remove DSA ssh key [puppet] - 10https://gerrit.wikimedia.org/r/437220 (https://phabricator.wikimedia.org/T177371) (owner: 10Alexandros Kosiaris) [10:03:24] (03PS2) 10Jcrespo: mariadb: Repool db1080 after reimage with low load [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437428 [10:04:10] 10Operations, 10Wikimedia-Mailing-lists: Reset admin password for wikimedia-mk - https://phabricator.wikimedia.org/T196438#4256494 (10Aklapper) Hi @Misos, thanks for taking the time to report this! Please see https://meta.wikimedia.org/wiki/Mailing_lists/Administration#Lost_list_administrator_passwords for fu... [10:04:30] 10Operations, 10Patch-For-Review: Phase out DSA keys for SSH access (ssh-dss) - https://phabricator.wikimedia.org/T177371#4256500 (10akosiaris) 05Open>03Resolved a:05MoritzMuehlenhoff>03akosiaris With the above 2 changes merged and the DSA private key removed from the repo, I think we can call this don... [10:05:42] (03CR) 10Jcrespo: [C: 032] mariadb: Repool db1080 after reimage with low load [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437428 (owner: 10Jcrespo) [10:05:53] (03CR) 10Alexandros Kosiaris: [C: 031] Enable microcode updates for Parsoid hosts [puppet] - 10https://gerrit.wikimedia.org/r/437430 (https://phabricator.wikimedia.org/T127825) (owner: 10Muehlenhoff) [10:06:52] (03Merged) 10jenkins-bot: mariadb: Repool db1080 after reimage with low load [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437428 (owner: 10Jcrespo) [10:08:08] !log jynus@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1080 with low load (duration: 00m 50s) [10:08:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:12:12] RECOVERY - Disk space on elastic1017 is OK: DISK OK [10:13:51] (03PS1) 10Jcrespo: mariadb: Depool db1081 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437431 [10:14:10] !log installing batik security updates [10:14:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:15:28] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1110" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437427 (owner: 10Marostegui) [10:15:33] (03CR) 10jenkins-bot: mariadb: Repool db1080 after reimage with low load [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437428 (owner: 10Jcrespo) [10:18:11] !log upgrade qemu to 1:2.8+dfsg-6+deb9u4 on ganeti01.svc.codfw.wmnet [10:18:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:18:16] moritzm: ^ [10:18:42] ack, great, I had been meaning to ping you for that :-) [10:18:54] I guess coupled with the kernel upgrades this mean a round of reboots for all VMs [10:20:01] yeah [10:20:24] !log reboot acrab for kernel upgrade and qemu upgrade [10:20:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:20:29] let's see how that fares for a bit [10:21:00] ack, I'll also doublecheck acrab wrt spectre v2 mitigation when it's back up [10:21:28] so, I guess I should pass +spec-ctrl for that, right ? [10:21:37] to qemu cpu flags I mean [10:23:17] (03PS1) 10Marostegui: filtered_tables: Remove ar_text and ar_flags [puppet] - 10https://gerrit.wikimedia.org/r/437432 (https://phabricator.wikimedia.org/T192926) [10:26:45] (03CR) 10Jcrespo: [C: 032] mariadb: Depool db1081 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437431 (owner: 10Jcrespo) [10:26:54] akosiaris: yes, seems so. on acrab ibpb isn't in the CPU and flags (and the kernel doesn't detect it either) [10:27:57] (03Merged) 10jenkins-bot: mariadb: Depool db1081 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437431 (owner: 10Jcrespo) [10:30:14] (03CR) 10jenkins-bot: mariadb: Depool db1081 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437431 (owner: 10Jcrespo) [10:31:16] akosiaris: if we're rebooting all ganeti instances, I'll quickly upgrade the (jessie/stretch) Linux kernels on those systems, most are currently on 4.9.82, but 4.9.88 is available and brings a number of bugfixes over the initial IPBP support which landed in 4.9.82 [10:32:15] (03PS13) 10Jcrespo: mariadb: Add extra_port on port + 20 for multiinstance hosts [puppet] - 10https://gerrit.wikimedia.org/r/435751 [10:32:17] (03PS1) 10Jcrespo: mariadb: Reimage db1081 as stretch [puppet] - 10https://gerrit.wikimedia.org/r/437433 [10:32:32] PROBLEM - Device not healthy -SMART- on db2059 is CRITICAL: cluster=mysql device=cciss,11 instance=db2059:9100 job=node site=codfw https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=db2059&var-datasource=codfw%2520prometheus%252Fops [10:33:34] moritzm: yeah makes sense. But there is no rush, take your time. [10:33:49] yep, some time this afternoon [10:35:27] (03PS1) 10Giuseppe Lavagetto: profile::mediawiki::jobrunner: do not run the service when there are no runners [puppet] - 10https://gerrit.wikimedia.org/r/437434 [10:36:57] (03PS2) 10Jcrespo: mariadb: Reimage db1081 as stretch [puppet] - 10https://gerrit.wikimedia.org/r/437433 [10:37:02] ACKNOWLEDGEMENT - Device not healthy -SMART- on db2059 is CRITICAL: cluster=mysql device=cciss,11 instance=db2059:9100 job=node site=codfw Marostegui T195867#4249051 - The acknowledgement expires at: 2018-06-11 10:36:00. https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=db2059&var-datasource=codfw%2520prometheus%252Fops [10:38:05] (03CR) 10Jcrespo: [C: 032] mariadb: Reimage db1081 as stretch [puppet] - 10https://gerrit.wikimedia.org/r/437433 (owner: 10Jcrespo) [10:38:27] !log reboot acrab for spec_ctrl CPU flag addition [10:38:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:40:14] moritzm: ibpb and ibrs showed up in acrab's /proc/cpuinfo [10:41:20] (03CR) 10Giuseppe Lavagetto: "https://puppet-compiler.wmflabs.org/compiler03/11378/ noop everywhere now, will stop the jobrunner service on the videoscalers once we mer" [puppet] - 10https://gerrit.wikimedia.org/r/437434 (owner: 10Giuseppe Lavagetto) [10:41:27] (03PS2) 10Giuseppe Lavagetto: profile::mediawiki::jobrunner: do not run the service when there are no runners [puppet] - 10https://gerrit.wikimedia.org/r/437434 [10:41:46] akosiaris: ack and it's also correctly detected in dmesg now [10:42:07] <_joe_> I hate ff-only so much... [10:42:17] (03CR) 10Giuseppe Lavagetto: [C: 032] profile::mediawiki::jobrunner: do not run the service when there are no runners [puppet] - 10https://gerrit.wikimedia.org/r/437434 (owner: 10Giuseppe Lavagetto) [10:42:18] sudo dmesg | grep "Enabling Indirect Branch Prediction Barrier" [10:42:39] ah nice [10:42:46] 4.9.88 also provides support for securing firmware calls against spectre [10:42:55] [ 0.123398] Spectre V2 : Enabling Restricted Speculation for firmware calls [10:43:00] yup [10:43:04] acrab is already upgraded [10:43:11] yeah, that one [10:43:14] I dare not ask what's going on with firmware calls [10:43:34] I fear a very very big rabbithole lies somewhere there [10:44:06] <_joe_> oh the good ole call_random_binary_blob_with_nsa_payload syscall [10:44:23] <_joe_> man 2, ofc [10:44:46] the fun part is that I enabled the qemu spec_ctrl cpu flag but this one does not show up in either hosts or guests cpu flags [10:44:54] (03PS6) 10Giuseppe Lavagetto: Remove unused jobrunners. [puppet] - 10https://gerrit.wikimedia.org/r/436574 (https://phabricator.wikimedia.org/T190327) (owner: 10Ppchelko) [10:45:10] <_joe_> Pchelolo: I'm going to merge your change, lgtm [10:45:12] however enabling that made the guest get the IBPB and IBRS cpu flags [10:45:23] !log jynus@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1081 (duration: 00m 50s) [10:45:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:45:26] essentially IBRS wasn't used in request_firmware: https://lkml.org/lkml/2018/2/26/425 [10:45:31] 10Operations, 10Wikimedia-Mailing-lists, 10Patch-For-Review: Make mediawiki-l archives indexed by search engines - https://phabricator.wikimedia.org/T193572#4256567 (10Tgr) [10:45:35] (03CR) 10Giuseppe Lavagetto: [C: 032] Remove unused jobrunners. [puppet] - 10https://gerrit.wikimedia.org/r/436574 (https://phabricator.wikimedia.org/T190327) (owner: 10Ppchelko) [10:45:56] it takes a while to figure out spec-ctrl == IBRS and that it also bundles IBPB [10:46:03] but only on intel cpus [10:46:13] _joe_: we'd need to make a bigger cleanup eventually for jobrunners, maybe in a couple of weeks when we finally finish moving everything [10:46:34] <_joe_> yeah [10:46:38] <_joe_> we have time [10:46:41] akosiaris: and it's particular confusing that spectre is an ongoing thing, if one now backports spec_ctrl to a given qemu release, it will not cover the next microcode updates done by Intel [10:46:57] <_joe_> I'm just removing the service on the videoscalers for now [10:47:16] dammit [10:47:23] I hate hardware [10:47:49] (03PS1) 10Jcrespo: Revert "mariadb: Depool db1081 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437436 [10:48:27] anyway moritzm: I am leaving acrab for today with those flags enabled. If everything is ok by tomorrow we can start looking into a kernel upgrade/reboot plan for all the ganeti VMs [10:48:49] akosiaris: sounds good! [10:49:05] (03PS1) 10Jcrespo: mariadb: Repool db1081 with low load after reimage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437437 [10:49:33] (03PS1) 10Jcrespo: Revert "mariadb: Depool db1080 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437438 [10:49:42] (03CR) 10jerkins-bot: [V: 04-1] Revert "mariadb: Depool db1080 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437438 (owner: 10Jcrespo) [10:51:06] !log stop db1081 for reimage [10:51:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:51:49] (03PS2) 10Mobrovac: Disable redis queue for cirrus except wikipedia, commons and wikidata. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436249 (https://phabricator.wikimedia.org/T189137) (owner: 10Ppchelko) [10:53:28] <_joe_> ouch, I might be causing some puppet master troubles right now [10:53:48] <_joe_> I forgot -b in my cumin command, it's running puppet on 41 hosts in parallel [10:55:03] _joe_: ctrl+c is always an option ;) [10:55:36] jynus: i have a mw-config change that i need to push out, can i go or are you in the process of pool/depool? [10:56:13] not touching it for 40 minutes [10:57:24] (03PS1) 10Sau226: Fixing very trivial spelling error in InitialiseSettings.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437441 [10:58:25] 10Operations, 10Citoid, 10Code-Stewardship-Reviews, 10VisualEditor, 10Services (watching): zotero translation server: code stewardship request - https://phabricator.wikimedia.org/T187194#4256587 (10faidon) So we need to do //something// in a very short amount of time (~two months) ­-- does anyone have a... [10:59:43] PROBLEM - Check size of conntrack table on mw1336 is CRITICAL: CRITICAL: nf_conntrack is 90 % full [11:00:23] PROBLEM - Check size of conntrack table on mw1308 is CRITICAL: CRITICAL: nf_conntrack is 90 % full [11:01:01] ok, i'm taking over deploy1001 for 10 minutes [11:01:26] (03CR) 10Mobrovac: [C: 032] Disable redis queue for cirrus except wikipedia, commons and wikidata. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436249 (https://phabricator.wikimedia.org/T189137) (owner: 10Ppchelko) [11:02:03] PROBLEM - Check size of conntrack table on mw1309 is CRITICAL: CRITICAL: nf_conntrack is 90 % full [11:02:23] PROBLEM - Check size of conntrack table on mw1334 is CRITICAL: CRITICAL: nf_conntrack is 90 % full [11:02:26] what's up? [11:02:32] PROBLEM - Check size of conntrack table on mw1311 is CRITICAL: CRITICAL: nf_conntrack is 90 % full [11:02:36] (03Merged) 10jenkins-bot: Disable redis queue for cirrus except wikipedia, commons and wikidata. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436249 (https://phabricator.wikimedia.org/T189137) (owner: 10Ppchelko) [11:02:51] (03CR) 10jenkins-bot: Disable redis queue for cirrus except wikipedia, commons and wikidata. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436249 (https://phabricator.wikimedia.org/T189137) (owner: 10Ppchelko) [11:03:06] these seem to be jobrunners? [11:03:14] _joe_: looks related to what you were doing maybe? [11:03:52] <_joe_> paravoid: uhm, maybe, it seems we're suddenly too efficient in processing jobs? [11:03:58] <_joe_> I'll take a look [11:04:52] PROBLEM - Check size of conntrack table on mw1336 is CRITICAL: CRITICAL: nf_conntrack is 90 % full [11:05:02] PROBLEM - Check size of conntrack table on mw1310 is CRITICAL: CRITICAL: nf_conntrack is 93 % full [11:05:03] PROBLEM - Check size of conntrack table on mw1337 is CRITICAL: CRITICAL: nf_conntrack is 90 % full [11:05:23] PROBLEM - Check size of conntrack table on mw1334 is CRITICAL: CRITICAL: nf_conntrack is 94 % full [11:05:33] PROBLEM - Check size of conntrack table on mw1311 is CRITICAL: CRITICAL: nf_conntrack is 92 % full [11:05:39] elukey@mw1309:~$ sudo sysctl net.netfilter.nf_conntrack_tcp_timeout_time_wait [11:05:42] net.netfilter.nf_conntrack_tcp_timeout_time_wait = 120 [11:05:42] should be 65 [11:05:43] fixing [11:05:48] <_joe_> yes, it's mostly local [11:05:57] <_joe_> elukey: I don't think it changed tbh [11:06:12] PROBLEM - Check size of conntrack table on mw1335 is CRITICAL: CRITICAL: nf_conntrack is 94 % full [11:06:13] <_joe_> I just need to reduce the number of calls [11:06:32] PROBLEM - Check size of conntrack table on mw1308 is CRITICAL: CRITICAL: nf_conntrack is 91 % full [11:06:34] <_joe_> but yes, do that manually for now while I prepare a patch, it's faster than rolling back [11:07:12] PROBLEM - Check size of conntrack table on mw1309 is CRITICAL: CRITICAL: nf_conntrack is 91 % full [11:07:20] !log manually set net.netfilter.nf_conntrack_tcp_timeout_time_wait to 65 (was 120) [11:07:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:07:32] PROBLEM - Check size of conntrack table on mw1302 is CRITICAL: CRITICAL: nf_conntrack is 90 % full [11:08:32] RECOVERY - Check size of conntrack table on mw1302 is OK: OK: nf_conntrack is 72 % full [11:08:32] RECOVERY - Check size of conntrack table on mw1334 is OK: OK: nf_conntrack is 74 % full [11:08:33] RECOVERY - Check size of conntrack table on mw1308 is OK: OK: nf_conntrack is 74 % full [11:08:42] RECOVERY - Check size of conntrack table on mw1311 is OK: OK: nf_conntrack is 68 % full [11:08:43] \o/ [11:08:53] RECOVERY - Check size of conntrack table on mw1336 is OK: OK: nf_conntrack is 66 % full [11:09:03] RECOVERY - Check size of conntrack table on mw1310 is OK: OK: nf_conntrack is 68 % full [11:09:13] RECOVERY - Check size of conntrack table on mw1337 is OK: OK: nf_conntrack is 64 % full [11:09:13] RECOVERY - Check size of conntrack table on mw1309 is OK: OK: nf_conntrack is 69 % full [11:09:13] RECOVERY - Check size of conntrack table on mw1335 is OK: OK: nf_conntrack is 66 % full [11:09:27] (03PS1) 10Giuseppe Lavagetto: jobrunner: reduce the number of runners, as we dropped a lot of jobs [puppet] - 10https://gerrit.wikimedia.org/r/437443 [11:09:48] <_joe_> elukey: https://gerrit.wikimedia.org/r/437443 [11:09:56] <_joe_> what do you think? [11:10:18] <_joe_> we are now executing way less different kind of jobs, we don't need all those runners [11:11:06] <_joe_> elukey: can you check across the fleet? [11:11:19] <_joe_> I'm pretty sure base::firewall is included everywhere [11:11:54] ack will do! about the runners, what jobs were dropped? I might miss some context sorry :( [11:12:14] <_joe_> hah sorry, we are now only running cirrus jobs on the old jobqueue [11:12:19] <_joe_> plus a couple more [11:13:00] <_joe_> the old queue is now 8 jobs long, I'll merge this change [11:13:10] <_joe_> but we need to understand why that parameter was wrong [11:13:15] (03CR) 10Giuseppe Lavagetto: [C: 032] jobrunner: reduce the number of runners, as we dropped a lot of jobs [puppet] - 10https://gerrit.wikimedia.org/r/437443 (owner: 10Giuseppe Lavagetto) [11:16:48] <_joe_> ok, it seems that setting doesn't work on stretch [11:17:08] <_joe_> or better, that sysctl::parameters doesn't [11:19:46] IIRC it was a race condition (https://phabricator.wikimedia.org/T136094) [11:21:01] so scanning mw* via cumin I can see 202 hosts with 120 [11:21:05] !log ppchelko@deploy1001 Started deploy [cpjobqueue/deploy@aa5e94b]: Enable cirrus jobs in kafka for everything except wikipedia, wikidata and commons T190327 [11:21:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:21:10] T190327: FY17/18 Q4 Program 8 Services Goal: Complete the JobQueue transition to EventBus - https://phabricator.wikimedia.org/T190327 [11:21:46] !log ppchelko@deploy1001 Finished deploy [cpjobqueue/deploy@aa5e94b]: Enable cirrus jobs in kafka for everything except wikipedia, wikidata and commons T190327 (duration: 00m 41s) [11:21:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:21:57] !log mobrovac@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Switch CirrusSearch jobs for all wikis except wp, wd, commons - T189137 (duration: 00m 51s) [11:22:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:22:02] T189137: Migrate CirrusSearch jobs to Kafka queue - https://phabricator.wikimedia.org/T189137 [11:23:29] <_joe_> elukey: the old jobqueue has now 2 jobs!! [11:25:27] woah [11:25:57] _joe_ ok if I set 65 everywhere ? (net.netfilter.nf_conntrack_tcp_timeout_time_wait) [11:27:54] <_joe_> elukey: not really, we need to be a bit careful [11:28:36] (03PS2) 10Jcrespo: mariadb: Repool db1081 with low load after reimage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437437 [11:28:52] (03PS2) 10Muehlenhoff: Enable microcode updates for Parsoid hosts [puppet] - 10https://gerrit.wikimedia.org/r/437430 (https://phabricator.wikimedia.org/T127825) [11:28:53] _joe_ not all in once of course, but the param should be set to 65 on mw iirc.. [11:28:54] (03CR) 10Giuseppe Lavagetto: "LGTM overall, just a couple minor comments. You will probably need to rebase on top of the latest changes." (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/430892 (owner: 10Muehlenhoff) [11:29:04] _joe_ also it is a timeout for conntrack, I don't see a big risk [11:29:07] <_joe_> elukey: on the mediawiki hosts, you can proceed [11:29:25] ah sorry everywhere for me was mediawiki servers :D [11:29:38] (since you asked to check them) [11:29:46] (03CR) 10Muehlenhoff: [C: 032] Enable microcode updates for Parsoid hosts [puppet] - 10https://gerrit.wikimedia.org/r/437430 (https://phabricator.wikimedia.org/T127825) (owner: 10Muehlenhoff) [11:30:22] !log manually set net.netfilter.nf_conntrack_tcp_timeout_time_wait to 65 (was 120) on mw* hosts [11:30:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:32:44] PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1946 bytes in 0.080 second response time [11:32:50] done [11:33:05] (03CR) 10Jcrespo: [C: 032] mariadb: Repool db1081 with low load after reimage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437437 (owner: 10Jcrespo) [11:33:40] 10Operations, 10Citoid, 10Code-Stewardship-Reviews, 10VisualEditor, 10Services (watching): zotero translation server: code stewardship request - https://phabricator.wikimedia.org/T187194#4256700 (10danstillman) We've gotten translation internals working in Node. We'll be hooking those up to the existing... [11:34:15] (03Merged) 10jenkins-bot: mariadb: Repool db1081 with low load after reimage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437437 (owner: 10Jcrespo) [11:35:48] !log jynus@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1081 with low load (duration: 00m 49s) [11:35:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:37:44] RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1942 bytes in 0.096 second response time [11:38:03] (03PS1) 10Ppchelko: Disable redis queue for cirrus search for all wikis. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437448 (https://phabricator.wikimedia.org/T189137) [11:39:11] (03CR) 10jenkins-bot: mariadb: Repool db1081 with low load after reimage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437437 (owner: 10Jcrespo) [11:40:41] 10Operations, 10Graphite: grafana access control - https://phabricator.wikimedia.org/T108546#4256724 (10Tgr) The non-admin web interface would work without LDAP, and did in the past; if you are not authenticated, it's effectively a read-only interface. That's the inconsistent part, IMO. [11:42:08] 10Operations, 10Analytics, 10ChangeProp, 10EventBus, and 4 others: Select candidate jobs for transferring to the new infrastucture - https://phabricator.wikimedia.org/T175210#4256726 (10Pchelolo) 05Open>03Resolved We have switched all jobs except certain outstanding problematic ones and we have tickets... [12:05:31] (03PS2) 10Sau226: Fixing very trivial spelling error in InitialiseSettings.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437441 [12:06:45] !log disable graceful-switchover on dual-RE routers [12:06:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:08:29] 10Operations, 10Patch-For-Review: Phase out DSA keys for SSH access (ssh-dss) - https://phabricator.wikimedia.org/T177371#4256823 (10MoritzMuehlenhoff) 05Resolved>03Open There's one remaining bit we need to clean up; I'll take care of it this week: Our puppet templates for sshd still configure a DSA host k... [12:12:12] !log kartik@deploy1001 Started deploy [cxserver/deploy@0a350c3]: Update cxserver to 7fb7671 [12:12:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:13:26] !log kartik@deploy1001 Finished deploy [cxserver/deploy@0a350c3]: Update cxserver to 7fb7671 (duration: 01m 15s) [12:13:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:32:54] 10Operations, 10Move-Files-To-Commons, 10TCB-Team, 10Wikimedia-Extension-setup, and 2 others: Deploy FileExporter and FileImporter to group0 - https://phabricator.wikimedia.org/T195370#4224988 (10Tobi_WMDE_SW) [12:35:48] (03PS2) 10Muehlenhoff: Install HPSSA diagnostics tool on servers with HP RAID [puppet] - 10https://gerrit.wikimedia.org/r/437417 [12:36:30] (03CR) 10Muehlenhoff: [C: 032] Install HPSSA diagnostics tool on servers with HP RAID [puppet] - 10https://gerrit.wikimedia.org/r/437417 (owner: 10Muehlenhoff) [12:40:34] PROBLEM - restbase endpoints health on restbase-dev1004 is CRITICAL: /en.wikipedia.org/v1/feed/onthisday/{type}/{mm}/{dd} (Retrieve selected the events for Jan 01) timed out before a response was received: /en.wikipedia.org/v1/transform/wikitext/to/html{/title}{/revision} (Transform wikitext to html) is WARNING: Test Transform wikitext to html responds with unexpected body: h2 id=HeadingHeading/h2 != /^h2.* Heading \/h2/ [12:44:25] PROBLEM - restbase endpoints health on restbase-dev1006 is CRITICAL: /en.wikipedia.org/v1/feed/onthisday/{type}/{mm}/{dd} (Retrieve selected the events for Jan 01) timed out before a response was received: /en.wikipedia.org/v1/transform/wikitext/to/html{/title}{/revision} (Transform wikitext to html) is WARNING: Test Transform wikitext to html responds with unexpected body: h2 id=HeadingHeading/h2 != /^h2.* Heading \/h2/: /en.w [12:44:25] e/media/{title}{/revision} (Get media in test page) is WARNING: Test Get media in test page responds with unexpected value at path /items[2] = Missing keys: [utitles, uthumbnail, ulicense] [12:47:32] 10Operations, 10netops, 10Patch-For-Review: Juniper HA audit - https://phabricator.wikimedia.org/T191667#4257212 (10ayounsi) [12:47:42] 10Operations, 10netops: Enabling graceful-switchover causes core dumps on cr1-codfw - https://phabricator.wikimedia.org/T191371#4257215 (10ayounsi) 05Open>03Resolved Closing this task, following up with actions to do in T191667. [12:49:02] (03PS1) 10Ema: Move microcode hiera call to profile::base [puppet] - 10https://gerrit.wikimedia.org/r/437464 (https://phabricator.wikimedia.org/T127825) [12:49:34] jouncebot, refresh [12:49:36] I refreshed my knowledge about deployments. [12:55:48] !log mobrovac@deploy1001 Started deploy [restbase/deploy@a39743d]: Fix transform monitoring tests [12:55:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:00:04] PROBLEM - restbase endpoints health on restbase-dev1006 is CRITICAL: /en.wikipedia.org/v1/feed/onthisday/{type}/{mm}/{dd} (Retrieve selected the events for Jan 01) timed out before a response was received: /en.wikipedia.org/v1/transform/wikitext/to/html{/title}{/revision} (Transform wikitext to html) is WARNING: Test Transform wikitext to html responds with unexpected body: h2 id=HeadingHeading/h2 != /^h2.* Heading \/h2/: /en.w [13:00:04] e/media/{title}{/revision} (Get media in test page) is WARNING: Test Get media in test page responds with unexpected value at path /items[2] = Missing keys: [utitles, uthumbnail, ulicense] [13:00:04] addshore, hashar, anomie, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: #bothumor My software never has bugs. It just develops random features. Rise for European Mid-day SWAT(Max 6 patches). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180605T1300). [13:00:04] Urbanecm: A patch you scheduled for European Mid-day SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [13:00:10] Present [13:00:31] o/ [13:00:32] I can SWAT today! :D [13:00:41] go for it zeljkof :) [13:00:55] (03CR) 10Ema: "https://puppet-compiler.wmflabs.org/compiler02/11381/" [puppet] - 10https://gerrit.wikimedia.org/r/437464 (https://phabricator.wikimedia.org/T127825) (owner: 10Ema) [13:02:13] Urbanecm: ah, there is just one commit?! :D [13:02:37] Should I add more? :D [13:02:47] (03PS1) 10Elukey: Move the varnishkafka submodule to operations/puppet [puppet] - 10https://gerrit.wikimedia.org/r/437467 (https://phabricator.wikimedia.org/T188377) [13:02:48] zeljkof, do you think that hashar can help with the mode change? [13:02:59] Urbanecm: hashar should know [13:03:07] mobrovac: my cxserver deployment failed, was it related to restbase fix? [13:03:19] Urbanecm: if he's fine with it, I'll deploy it [13:03:19] hashar, how to merge https://gerrit.wikimedia.org/r/#/c/436988/ and https://gerrit.wikimedia.org/r/#/c/436994/? Should there be a usual SWAT, [13:03:24] no kart_ [13:03:39] kart_: are you deploying from deploy1001? [13:04:12] hashar, to explain, IS.php has the executable flag starting from https://gerrit.wikimedia.org/r/#/c/404911/ [13:04:27] mobrovac: yes [13:04:29] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436989 (https://phabricator.wikimedia.org/T196223) (owner: 10Urbanecm) [13:04:32] IS.php cannot be directly executed, as it do not have #! in its start so the kernel do not know how to execute it. [13:04:36] hashar, EOM :D [13:05:08] Urbanecm: well I dont see why IS.php should have the executable flag. Surely we could add mediawiki/minus-x to the tests [13:05:17] (03PS2) 10Elukey: Move the varnishkafka submodule to operations/puppet [puppet] - 10https://gerrit.wikimedia.org/r/437467 (https://phabricator.wikimedia.org/T188377) [13:05:42] (03Merged) 10jenkins-bot: Remove ruwiki from MFSpecialCaseMainPage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436989 (https://phabricator.wikimedia.org/T196223) (owner: 10Urbanecm) [13:05:51] That's my point of view as well. I think it should be just merged and pushed to the production, am I right hashar? [13:06:02] yup [13:06:03] (03PS3) 10Elukey: Move the varnishkafka submodule to operations/puppet [puppet] - 10https://gerrit.wikimedia.org/r/437467 (https://phabricator.wikimedia.org/T188377) [13:06:21] Urbanecm: 436989 is at mwdebug1002 [13:06:46] Ok, thanks. zeljkof, I'm going to re-add the two commits to the calendar. [13:06:47] zeljkof, ack [13:07:21] Urbanecm: just make sure hashar has added a +1 to both commits ;) [13:07:47] mobrovac: looks like bug in cxserver or cxserver/deploy, so not related to anything restbase or deploy1001 [13:07:55] kk [13:08:05] zeljkof, he said "yup" above. hashar, can you please cr+1 to them as requested above please? :) [13:08:56] hashar: please review and +1 the commits if you think they are ok, and I'll deploy [13:09:01] zeljkof: what are you afraid of ? the executable flag is obvisouly a mistake in https://gerrit.wikimedia.org/r/#/c/404911/ [13:09:16] hashar: I don't know, breaking everything :P [13:10:06] I am not familiar with how those files might be used somewhere, so a bit more cautious than needed I guess :) [13:12:07] (03PS4) 10Muehlenhoff: Switch video scalers to a profile [puppet] - 10https://gerrit.wikimedia.org/r/430892 [13:12:24] PROBLEM - restbase endpoints health on restbase-dev1005 is CRITICAL: /en.wikipedia.org/v1/feed/onthisday/{type}/{mm}/{dd} (Retrieve selected the events for Jan 01) timed out before a response was received: /en.wikipedia.org/v1/transform/wikitext/to/html{/title}{/revision} (Transform wikitext to html) is WARNING: Test Transform wikitext to html responds with unexpected body: h2 id=HeadingHeading/h2 != /^h2.* Heading \/h2/: /en.w [13:12:24] e/media/{title}{/revision} (Get media in test page) is WARNING: Test Get media in test page responds with unexpected value at path /items[2] = Missing keys: [utitles, uthumbnail, ulicense] [13:13:50] !log mobrovac@deploy1001 Finished deploy [restbase/deploy@a39743d]: Fix transform monitoring tests (duration: 18m 02s) [13:13:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:14:07] Urbanecm: ok to deploy 436989? [13:14:20] 10Operations, 10Dumps-Generation, 10Wikimedia-log-errors: High rate of "Memcached error .. CONNECTION FAILURE" on snapshot hosts - https://phabricator.wikimedia.org/T196303#4257424 (10ArielGlenn) I got one of these to work, that is, to fail. As user dumpsgen, strace -o strace_morejunk.txt -s 500 /usr/bin... [13:14:21] do you need more time to test it at mwdebug? or did you miss my ping :) [13:14:27] !log mobrovac@deploy1001 Started deploy [restbase/deploy@a39743d]: Fix transform monitoring tests [13:14:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:15:25] Urbanecm: also, I don't see the two new commits at the calendar o.O [13:16:04] PROBLEM - restbase endpoints health on restbase-dev1004 is CRITICAL: /en.wikipedia.org/v1/transform/wikitext/to/html{/title}{/revision} (Transform wikitext to html) is WARNING: Test Transform wikitext to html responds with unexpected body: h2 id=HeadingHeading/h2 != /^h2.* Heading \/h2/: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was rece [13:17:44] Urbanecm: still around? internets down? :) [13:18:33] (03PS2) 10Ema: Move microcode hiera call to profile::base [puppet] - 10https://gerrit.wikimedia.org/r/437464 (https://phabricator.wikimedia.org/T127825) [13:18:35] (03PS7) 10Ema: prometheus: export intel-microcode information via node_exporter [puppet] - 10https://gerrit.wikimedia.org/r/436553 (https://phabricator.wikimedia.org/T127825) [13:18:54] PROBLEM - restbase endpoints health on restbase-dev1006 is CRITICAL: /en.wikipedia.org/v1/feed/onthisday/{type}/{mm}/{dd} (Retrieve selected the events for Jan 01) timed out before a response was received: /en.wikipedia.org/v1/transform/wikitext/to/html{/title}{/revision} (Transform wikitext to html) is WARNING: Test Transform wikitext to html responds with unexpected body: h2 id=HeadingHeading/h2 != /^h2.* Heading \/h2/: /en.w [13:18:54] e/media/{title}{/revision} (Get media in test page) is WARNING: Test Get media in test page responds with unexpected value at path /items[2] = Missing keys: [utitles, uthumbnail, ulicense] [13:20:24] PROBLEM - restbase endpoints health on restbase-dev1004 is CRITICAL: /en.wikipedia.org/v1/transform/wikitext/to/html{/title}{/revision} (Transform wikitext to html) is WARNING: Test Transform wikitext to html responds with unexpected body: h2 id=HeadingHeading/h2 != /^h2.* Heading \/h2/: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) is CRITICAL: Test Retrieve aggregate [13:20:24] April 29, 2016 responds with malformed body (AttributeError: NoneType object has no attribute get) [13:22:26] !log mobrovac@deploy1001 Finished deploy [restbase/deploy@a39743d]: Fix transform monitoring tests (duration: 08m 00s) [13:22:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:22:49] !log mobrovac@deploy1001 Started deploy [restbase/deploy@a39743d]: Fix transform monitoring tests [13:22:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:23:13] Urbanecm: ok, I'll assume internet problems and close swat window after I deploy 436989 [13:23:22] 10Operations, 10Wikidata, 10Wikidata-Campsite, 10Wikimedia-General-or-Unknown, and 5 others: Multiple projects reporting Cannot access the database: No working replica DB server - https://phabricator.wikimedia.org/T195520#4257466 (10Lucas_Werkmeister_WMDE) [13:24:06] (03CR) 10Elukey: "LGTM, let me know when you want this to be merged :)" [puppet] - 10https://gerrit.wikimedia.org/r/437298 (https://phabricator.wikimedia.org/T178905) (owner: 10Eevans) [13:24:19] !log zfilipin@deploy1001 Synchronized dblists/mobilemainpagelegacy.dblist: SWAT: [[gerrit:436989|Remove ruwiki from MFSpecialCaseMainPage (T196223)]] (duration: 00m 51s) [13:24:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:24:24] T196223: Remove ruwiki from MFSpecialCaseMainPage - https://phabricator.wikimedia.org/T196223 [13:24:29] (03CR) 10Ema: "https://puppet-compiler.wmflabs.org/compiler02/11382/" [puppet] - 10https://gerrit.wikimedia.org/r/436553 (https://phabricator.wikimedia.org/T127825) (owner: 10Ema) [13:24:38] !log EU SWAT finished [13:24:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:24:49] (03CR) 10Ottomata: [C: 031] "OOO, elukey great idea. I like it!" [puppet] - 10https://gerrit.wikimedia.org/r/436012 (owner: 10Ottomata) [13:24:51] Urbanecm: ping me if you are back during SWAT window, we can resume [13:25:57] 10Operations, 10Wikidata, 10Wikidata-Campsite, 10Wikimedia-General-or-Unknown, and 6 others: Multiple projects reporting Cannot access the database: No working replica DB server - https://phabricator.wikimedia.org/T195520#4257491 (10Addshore) a:03Addshore [13:26:01] 10Operations, 10Traffic, 10User-Johan: Provide a multi-language user-faced warning regarding AES128-SHA deprecation - https://phabricator.wikimedia.org/T196371#4257494 (10Vgutierrez) >>! In T196371#4255350, @Johan wrote: > The message should also clearly state that this means they won't be able to access Wik... [13:26:04] !log mobrovac@deploy1001 Finished deploy [restbase/deploy@a39743d]: Fix transform monitoring tests (duration: 03m 15s) [13:26:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:26:44] (03CR) 10Ema: prometheus: export intel-microcode information via node_exporter (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/436553 (https://phabricator.wikimedia.org/T127825) (owner: 10Ema) [13:28:26] (03CR) 10Ottomata: [C: 031] Move the varnishkafka submodule to operations/puppet [puppet] - 10https://gerrit.wikimedia.org/r/437467 (https://phabricator.wikimedia.org/T188377) (owner: 10Elukey) [13:33:34] (03CR) 10Alexandros Kosiaris: [C: 032] "Noop on almost all of the fleet except kubernetes boxes as expected and even there no functional changes. https://puppet-compiler.wmflabs." [puppet] - 10https://gerrit.wikimedia.org/r/437241 (owner: 10Alexandros Kosiaris) [13:33:39] (03PS2) 10Alexandros Kosiaris: Bump lvm puppet module to 1.0.1 [puppet] - 10https://gerrit.wikimedia.org/r/437241 [13:33:41] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] Bump lvm puppet module to 1.0.1 [puppet] - 10https://gerrit.wikimedia.org/r/437241 (owner: 10Alexandros Kosiaris) [13:33:48] (03CR) 10Elukey: "Noop from https://puppet-compiler.wmflabs.org/compiler02/11383/" [puppet] - 10https://gerrit.wikimedia.org/r/437467 (https://phabricator.wikimedia.org/T188377) (owner: 10Elukey) [13:37:06] (03CR) 10Muehlenhoff: Switch video scalers to a profile (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/430892 (owner: 10Muehlenhoff) [13:37:09] (03PS5) 10Muehlenhoff: Switch video scalers to a profile [puppet] - 10https://gerrit.wikimedia.org/r/430892 [13:42:37] (03CR) 10Giuseppe Lavagetto: [C: 031] Move microcode hiera call to profile::base [puppet] - 10https://gerrit.wikimedia.org/r/437464 (https://phabricator.wikimedia.org/T127825) (owner: 10Ema) [13:43:34] PROBLEM - puppet last run on kubernetes1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [13:43:49] (03CR) 10Giuseppe Lavagetto: [C: 031] "I didn't check the prometheus exporter at all, but the puppet structure is ok." [puppet] - 10https://gerrit.wikimedia.org/r/436553 (https://phabricator.wikimedia.org/T127825) (owner: 10Ema) [13:44:35] (03CR) 10jenkins-bot: Remove ruwiki from MFSpecialCaseMainPage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436989 (https://phabricator.wikimedia.org/T196223) (owner: 10Urbanecm) [13:45:37] (03PS2) 10Alexandros Kosiaris: lvm: Always force vgremoval [puppet] - 10https://gerrit.wikimedia.org/r/437242 [13:46:03] (03CR) 10Alexandros Kosiaris: [C: 032] "I don't expect this to hurt anything, I 'll merge and see what happens" [puppet] - 10https://gerrit.wikimedia.org/r/437242 (owner: 10Alexandros Kosiaris) [13:46:06] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] lvm: Always force vgremoval [puppet] - 10https://gerrit.wikimedia.org/r/437242 (owner: 10Alexandros Kosiaris) [13:46:08] (03CR) 10jerkins-bot: [V: 04-1] lvm: Always force vgremoval [puppet] - 10https://gerrit.wikimedia.org/r/437242 (owner: 10Alexandros Kosiaris) [13:55:59] !log ladsgroup@terbium:~$ mwscript deleteAutoPatrolLogs.php --wiki=commonswiki --sleep 2 --check-old [13:56:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:56:31] (03PS8) 10Andrew Bogott: wikitech: remove OpenStackManager private settings [puppet] - 10https://gerrit.wikimedia.org/r/432703 (https://phabricator.wikimedia.org/T161553) [13:58:02] (03PS2) 10Addshore: Wikidata: Always have 4 change dispatchers running [mediawiki-config] - 10https://gerrit.wikimedia.org/r/435648 (https://phabricator.wikimedia.org/T194602) (owner: 10Hoo man) [14:01:05] (03CR) 10Muehlenhoff: [C: 031] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/437464 (https://phabricator.wikimedia.org/T127825) (owner: 10Ema) [14:06:58] (03PS14) 10Jcrespo: mariadb: Add extra_port on port + 20 for multiinstance hosts [puppet] - 10https://gerrit.wikimedia.org/r/435751 [14:09:04] RECOVERY - puppet last run on kubernetes1003 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [14:09:35] !log disabling puppet on eqiad dbs for 435751 gerrit deploy [14:09:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:11:40] (03CR) 10Jcrespo: [C: 032] mariadb: Add extra_port on port + 20 for multiinstance hosts [puppet] - 10https://gerrit.wikimedia.org/r/435751 (owner: 10Jcrespo) [14:12:02] 10Operations, 10Dumps-Generation, 10Wikimedia-log-errors: High rate of "Memcached error .. CONNECTION FAILURE" on snapshot hosts - https://phabricator.wikimedia.org/T196303#4257638 (10ArielGlenn) From T196125: get* and getMulti* commands now take the Memcached::GET_EXTENDED flag to retrieve user flags and ca... [14:12:39] (03PS3) 10Ema: Move microcode hiera call to profile::base [puppet] - 10https://gerrit.wikimedia.org/r/437464 (https://phabricator.wikimedia.org/T127825) [14:13:51] (03CR) 10Ema: [C: 032] Move microcode hiera call to profile::base [puppet] - 10https://gerrit.wikimedia.org/r/437464 (https://phabricator.wikimedia.org/T127825) (owner: 10Ema) [14:15:47] (03PS12) 10Elukey: Create profile::analytics::cluster::packages::* classes [puppet] - 10https://gerrit.wikimedia.org/r/436012 (owner: 10Ottomata) [14:16:44] PROBLEM - Check systemd state on pc2004 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [14:17:14] PROBLEM - puppet last run on mw2264 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [14:20:34] PROBLEM - Check systemd state on db2088 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [14:23:07] (03PS1) 10Jcrespo: Revert "mariadb: Add extra_port on port + 20 for multiinstance hosts" [puppet] - 10https://gerrit.wikimedia.org/r/437475 [14:23:15] PROBLEM - Check systemd state on db2084 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [14:23:17] (03PS2) 10Jcrespo: Revert "mariadb: Add extra_port on port + 20 for multiinstance hosts" [puppet] - 10https://gerrit.wikimedia.org/r/437475 [14:23:36] apparently I am missing a semicolon somewhere [14:23:45] (03CR) 10Jcrespo: [V: 032 C: 032] Revert "mariadb: Add extra_port on port + 20 for multiinstance hosts" [puppet] - 10https://gerrit.wikimedia.org/r/437475 (owner: 10Jcrespo) [14:23:55] PROBLEM - Check systemd state on pc2006 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [14:24:45] PROBLEM - Check systemd state on db2094 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [14:25:04] RECOVERY - Check systemd state on pc2006 is OK: OK - running: The system is fully operational [14:25:14] PROBLEM - Check systemd state on db1096 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [14:26:03] no impact, except ferm failing- firewall wasn't affected at all [14:26:08] <_joe_> oh ok [14:26:13] <_joe_> I was about to ask :) [14:26:15] RECOVERY - Check systemd state on db1096 is OK: OK - running: The system is fully operational [14:26:30] (03CR) 10Elukey: "Everything is still WIP, going through all the pcc changes, will need to tweak some stuff before being ready." [puppet] - 10https://gerrit.wikimedia.org/r/436012 (owner: 10Ottomata) [14:26:34] that is why I disabled puppet on eqiad, too [14:27:25] RECOVERY - puppet last run on mw2264 is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures [14:27:35] (03CR) 10Eevans: [C: 031] "> LGTM, let me know when you want this to be merged :)" [puppet] - 10https://gerrit.wikimedia.org/r/437298 (https://phabricator.wikimedia.org/T178905) (owner: 10Eevans) [14:27:41] 10Operations, 10DNS, 10Release-Engineering-Team, 10Traffic, and 2 others: Move Foundation Wiki to new URL when new Wikimedia Foundation website launches - https://phabricator.wikimedia.org/T188776#4019041 (10Liuxinyu970226) Personally, KATM...'s comment give me another probably challenge: Instead of still... [14:28:03] (03PS3) 10Elukey: cassandra: update configuration for 3.11.2 [puppet] - 10https://gerrit.wikimedia.org/r/437298 (https://phabricator.wikimedia.org/T178905) (owner: 10Eevans) [14:28:38] (03CR) 10Elukey: [C: 032] cassandra: update configuration for 3.11.2 [puppet] - 10https://gerrit.wikimedia.org/r/437298 (https://phabricator.wikimedia.org/T178905) (owner: 10Eevans) [14:29:16] urandom: merged! [14:29:19] RECOVERY - Check systemd state on db2088 is OK: OK - running: The system is fully operational [14:29:43] literally, a missing ; that the puppet compiler didn't catch [14:29:51] (03PS1) 10Jcrespo: Revert "Revert "mariadb: Add extra_port on port + 20 for multiinstance hosts"" [puppet] - 10https://gerrit.wikimedia.org/r/437476 [14:30:35] elukey: thank you sir! [14:32:09] (03PS2) 10Jcrespo: Revert "Revert "mariadb: Add extra_port on port + 20 for multiinstance hosts"" [puppet] - 10https://gerrit.wikimedia.org/r/437476 [14:32:57] (03CR) 10Jcrespo: [C: 032] Revert "Revert "mariadb: Add extra_port on port + 20 for multiinstance hosts"" [puppet] - 10https://gerrit.wikimedia.org/r/437476 (owner: 10Jcrespo) [14:42:33] !log reenabling puppet on all databases [14:42:34] (03PS13) 10Elukey: [WIP] Create profile::analytics::cluster::packages::* classes [puppet] - 10https://gerrit.wikimedia.org/r/436012 (owner: 10Ottomata) [14:42:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:44:19] RECOVERY - Check systemd state on pc2004 is OK: OK - running: The system is fully operational [14:47:30] RECOVERY - Check systemd state on db2084 is OK: OK - running: The system is fully operational [14:49:09] RECOVERY - Check systemd state on db2094 is OK: OK - running: The system is fully operational [14:51:38] !log installing fixed kernels/microcode for spectre v2 on labvirt* [14:51:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:52:50] !log restarting restbase-dev1004 - T178905 [14:52:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:52:54] T178905: Upgrade RESTBase cluster to Cassandra release: 3.11.2 - https://phabricator.wikimedia.org/T178905 [14:55:49] !log Downtime labsdb1005 and labsdb1004 for maintenance on labsdb1005 - T195313 [14:55:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:57:51] (03PS14) 10Elukey: [WIP] Create profile::analytics::cluster::packages::* classes [puppet] - 10https://gerrit.wikimedia.org/r/436012 (owner: 10Ottomata) [15:04:45] (03PS1) 10Giuseppe Lavagetto: Remove useless file_exists() call [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437481 [15:08:08] (03CR) 10Gilles: [C: 031] Remove useless file_exists() call [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437481 (owner: 10Giuseppe Lavagetto) [15:08:43] 10Operations, 10Maps-Sprint, 10Maps (Tilerator): Externalize tile storage for maps - https://phabricator.wikimedia.org/T196474#4257803 (10Gehel) [15:08:44] (03PS15) 10Elukey: [WIP] Create profile::analytics::cluster::packages::* classes [puppet] - 10https://gerrit.wikimedia.org/r/436012 (owner: 10Ottomata) [15:17:16] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: pc2005 down - https://phabricator.wikimedia.org/T196339#4257846 (10Papaul) Will be receiving a main board replacement . Enterprise Service Request 966671345 [15:17:41] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: pc2005 down - https://phabricator.wikimedia.org/T196339#4257847 (10Papaul) Good morning Papaul, I set up a dispatch for a replacement motherboard and daughter card set for parts only as requested. I would recommend replacing the daughter card first... [15:18:59] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: pc2005 down - https://phabricator.wikimedia.org/T196339#4257848 (10jcrespo) yay! for the replacements, sorry to create you more work. [15:21:01] 10Operations, 10ops-codfw, 10DBA: Degraded RAID on db2047 - https://phabricator.wikimedia.org/T196246#4257853 (10Papaul) @Marostegui since db2067 is out of service, would you like for me to take one disk from it and replace the failed disk on db2047? [15:23:24] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: pc2005 down - https://phabricator.wikimedia.org/T196339#4257855 (10Papaul) @jcrespo you didn't create me more work, it is my work lol [15:23:56] 10Operations, 10ops-codfw, 10DBA: Degraded RAID on db2047 - https://phabricator.wikimedia.org/T196246#4257856 (10Marostegui) @Papaul db2067 is online, it is not out of service. You sure you meant db2067? [15:24:25] !log Stop MySQL on labsdb1005 [15:24:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:25:31] 10Operations, 10ops-codfw, 10DBA: Degraded RAID on db2047 - https://phabricator.wikimedia.org/T196246#4251009 (10jcrespo) He meant (probably) db2064 [15:25:40] marostegui: try unmounting /srv there [15:25:49] I don't trust there is something else going on there [15:25:53] yeah [15:26:17] in the past I discovered screen sessions [15:26:30] 10Operations, 10ops-codfw, 10DBA: Degraded RAID on db2047 - https://phabricator.wikimedia.org/T196246#4257862 (10Papaul) db2064 thanks @jcrespo [15:28:28] 10Operations, 10ops-codfw, 10DBA: Degraded RAID on db2047 - https://phabricator.wikimedia.org/T196246#4257863 (10Marostegui) Sure then, if the disks are the same (600GB) :-) [15:30:00] !log reboot labsdb1005 for upgrades T195313 [15:30:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:31:42] 10Operations, 10ops-codfw, 10DBA: Degraded RAID on db2047 - https://phabricator.wikimedia.org/T196246#4257873 (10Papaul) a:05Papaul>03Marostegui Disk replacement complete. [15:32:49] 10Operations, 10ops-codfw, 10DBA: Degraded RAID on db2047 - https://phabricator.wikimedia.org/T196246#4257877 (10Marostegui) Thanks! ``` physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SAS, 600 GB, Rebuilding) ``` [15:33:11] jouncebot: next [15:33:11] In 0 hour(s) and 26 minute(s): Puppet SWAT(Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180605T1600) [15:33:13] jouncebot: now [15:33:13] No deployments scheduled for the next 0 hour(s) and 26 minute(s) [15:34:14] (03PS2) 10Jcrespo: Revert "mariadb: Depool db1080 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437438 [15:34:36] I actually was going to deploy something, were you, Reedy? [15:34:57] jynus: Not necessarily. _joe_ wants a patch deploying, just wondering what was going on atm :) [15:36:02] (03PS3) 10Jcrespo: Revert "mariadb: Depool db1080 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437438 [15:36:04] (03CR) 10Reedy: [C: 031] "It's basically always false now" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437481 (owner: 10Giuseppe Lavagetto) [15:36:24] ok, will wait [15:36:41] jynus: feel free to go ahead [15:36:52] ok, doing, then [15:37:03] (03CR) 10Jcrespo: [C: 032] Remove useless file_exists() call [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437481 (owner: 10Giuseppe Lavagetto) [15:37:13] ups [15:37:16] wrong patch [15:37:21] haha [15:37:30] leave it in place on tin, I'll deploy it after you're done [15:37:41] uh, mwdeploy1001 [15:37:42] tin? are you from the past? [15:37:43] :) [15:37:45] no pproblem [15:37:49] it is not merged [15:37:52] I just voted [15:37:56] and then reverted [15:38:01] marostegui: many years of that being hte deploy host... [15:38:10] Muscle memory will take a whilet o fix [15:38:17] (03CR) 10Jcrespo: [C: 031] "Sorry, pressed the wrong button." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437481 (owner: 10Giuseppe Lavagetto) [15:39:02] (03CR) 10Jcrespo: [C: 032] Revert "mariadb: Depool db1080 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437438 (owner: 10Jcrespo) [15:40:11] (03Merged) 10jenkins-bot: Revert "mariadb: Depool db1080 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437438 (owner: 10Jcrespo) [15:40:25] (03PS2) 10Jcrespo: Remove useless file_exists() call [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437481 (owner: 10Giuseppe Lavagetto) [15:41:31] hey, a really rookie question here - `wmf-config/InitialiseSettings-labs.php` file has a set of config flags for beta cluster. some of those are prefixed with -, other not, also some config changes have wmg, other wm -> can someone explain me whats that (both - and wmg|wg) ? [15:41:54] !log jynus@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1080 fully (duration: 00m 49s) [15:41:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:42:34] !log reindexing Slovak wikis on elastic@codfw (T191545) [15:42:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:43:44] T191545: Re-index Slovak Wikis after analysis chain is deployed - https://phabricator.wikimedia.org/T191545 [15:43:47] (03CR) 10Muehlenhoff: [C: 031] "Ack, looks good." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437481 (owner: 10Giuseppe Lavagetto) [15:44:08] (03CR) 10jenkins-bot: Revert "mariadb: Depool db1080 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437438 (owner: 10Jcrespo) [15:45:11] ok, I found meaning of `-`, but the wmg and wg is still a question ;/ [15:46:46] (03PS6) 10Giuseppe Lavagetto: Switch video scalers to a profile [puppet] - 10https://gerrit.wikimedia.org/r/430892 (owner: 10Muehlenhoff) [15:46:48] (03PS1) 10Giuseppe Lavagetto: profile::mediawiki::jobrunner: manage both videoscaler, jobrunner [puppet] - 10https://gerrit.wikimedia.org/r/437490 [15:46:50] (03PS1) 10Giuseppe Lavagetto: profile::mediawiki::videoscaler: remove global Timeout setting [puppet] - 10https://gerrit.wikimedia.org/r/437491 [15:46:52] (03PS1) 10Giuseppe Lavagetto: jobrunner: add profile::mediawiki::videoscaler [puppet] - 10https://gerrit.wikimedia.org/r/437492 [15:46:55] (03PS1) 10Giuseppe Lavagetto: videoscaler/jobrunner: add the respective VIPs [puppet] - 10https://gerrit.wikimedia.org/r/437493 [15:46:57] (03PS1) 10Giuseppe Lavagetto: conftool-data: merge the jobrunner, videoscaler clusters [puppet] - 10https://gerrit.wikimedia.org/r/437494 [15:47:16] <_joe_> moritzm: this series ^^ will allow to merge videoscalers and jobrunners [15:47:30] <_joe_> I'd say I'll try doing part of that tomorrow [15:47:31] (03CR) 10jerkins-bot: [V: 04-1] profile::mediawiki::jobrunner: manage both videoscaler, jobrunner [puppet] - 10https://gerrit.wikimedia.org/r/437490 (owner: 10Giuseppe Lavagetto) [15:48:10] _joe_: sound great, I'm in an interview now and off when it's done, but will look over the patches tomorrow [15:48:26] <_joe_> yeah, no rush, I'm going off in a few as well [15:49:13] (03PS2) 10Jcrespo: Revert "mariadb: Depool db1081 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437436 [15:50:05] 10Operations, 10ops-codfw, 10fundraising-tech-ops: rack/setup/install Prometeuse/Grafana host for fr-tech - https://phabricator.wikimedia.org/T196476#4257980 (10Papaul) [15:53:24] _joe_, could I borrow 5 minutes? you might know the answer to my question above [15:54:05] in `wmf-config/InitialiseSettings-labs.php` some configs are prefixed with wg, other wmg. I cannot find whats that [15:55:52] haha [15:55:59] raynor: Legacy reasons, mostly [15:56:14] (03Abandoned) 10Jcrespo: WIP: setup s8 and db-common.php refactoring [mediawiki-config] - 10https://gerrit.wikimedia.org/r/391198 (https://phabricator.wikimedia.org/T177208) (owner: 10Jcrespo) [15:56:17] ok, so if I want to change something which prefix should I use? [15:56:24] Depends what you're wanting to change [15:56:34] wg = wiki global, wmg = wikimedia global [15:56:57] usually, we have a pattern in CommonSettings of doing $wgFoo = $wmgFoo; [15:57:14] ok, so couple days ago I wanted to change something on beta cluster [15:57:27] on this wiki - https://en.m.wikipedia.beta.wmflabs.org/wiki/Main_Page [15:57:33] https://gerrit.wikimedia.org/r/#/c/436596/2/wmf-config/InitialiseSettings-labs.php [15:57:36] this is the change I did [15:58:09] thats the config I'm overriding: https://github.com/wikimedia/mediawiki-extensions-Popups/blob/master/extension.json#L68 [15:58:18] and it doesn't work, I have no idea why [15:58:19] PROBLEM - Host labvirt1019 is DOWN: PING CRITICAL - Packet loss = 100% [15:58:22] !log upgrade Cassandra to 3.11.2, restbase-dev1004 - T178905 [15:58:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:58:27] T178905: Upgrade RESTBase cluster to Cassandra release: 3.11.2 - https://phabricator.wikimedia.org/T178905 [15:58:45] So, the - prefix has special meaning... [15:58:57] But that aside, wmgFoo isn't used in an extension [15:59:16] It's also not defined/used in CommonSettings/InitialiseSettings [16:00:04] godog, moritzm, and _joe_: Time to snap out of that daydream and deploy Puppet SWAT(Max 6 patches). Get on with it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180605T1600). [16:00:04] No GERRIT patches in the queue for this window AFAICS. [16:00:50] 10Operations, 10ops-eqiad, 10Cloud-Services, 10Patch-For-Review: Connect or troubleshoot eth1 on labvirt1019 and labvirt1020 - https://phabricator.wikimedia.org/T194964#4258028 (10Cmjohnson) @bstorm I replaced the cable and the sfp-t just in case and labvirt1019 is in the installer now [16:01:22] raynor: Basically. Because that config is not set in CommonSettings/InitialiseSettings for production, you don't need the - and you don't need the m [16:01:41] yes, the PopupsOptInStateForNewAccounts is not defined in InitializeSettings.php yet, it uses the default value from extension.json [16:02:21] right know it's not defined, but in near future we want to change it to "1" for different wikis [16:02:27] (03CR) 10Jcrespo: [C: 032] Revert "mariadb: Depool db1081 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437436 (owner: 10Jcrespo) [16:02:47] So, you just add a dblist or a dbname in the same array, after the default, and it'll just work [16:03:41] (03Merged) 10jenkins-bot: Revert "mariadb: Depool db1081 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437436 (owner: 10Jcrespo) [16:03:47] ah, ok, so if I remove the `-` and `m` it should start working on https://en.m.wikipedia.beta.wmflabs.org/wiki/Main_Page (even that config is not defined in InitializeSettings.php file) [16:03:53] Yup [16:04:08] For historic reasons, it was needed with how we included extensions [16:04:18] InitialiseSettings would be loaded, but then the php config would be loaded in, overriding this [16:04:28] RECOVERY - Device not healthy -SMART- on db2047 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=db2047&var-datasource=codfw%2520prometheus%252Fops [16:04:34] So we'd define wmgFoo = bar, load the extension, then set wgFoo = wmgFoo [16:04:40] But we don't need to do that (in most cases) [16:04:50] The config will be sorted out automatically [16:05:33] !log jynus@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1081 fully (duration: 00m 49s) [16:05:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:05:43] raynor: The other reason to use wmg is if it's config just for wmf config... So wmgEnableMyExtension rather than wgEnableMyExtension [16:05:56] ok, let me create the config, Reedy could you look at before it gets merged? [16:06:02] Yeah, I can [16:06:07] awesome, thx [16:06:32] (03PS3) 10Reedy: Remove useless file_exists() call [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437481 (owner: 10Giuseppe Lavagetto) [16:06:35] (03CR) 10Reedy: [C: 032] Remove useless file_exists() call [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437481 (owner: 10Giuseppe Lavagetto) [16:07:16] !log starting branch cut for 1.32.0-wmf.7 [16:07:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:07:47] (03Merged) 10jenkins-bot: Remove useless file_exists() call [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437481 (owner: 10Giuseppe Lavagetto) [16:09:31] (03CR) 10jenkins-bot: Revert "mariadb: Depool db1081 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437436 (owner: 10Jcrespo) [16:10:15] (03PS1) 10Pmiazga: beta: Enable PP for newly created accounts [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437502 (https://phabricator.wikimedia.org/T191888) [16:10:19] !log reedy@deploy1001 Synchronized wmf-config/CommonSettings.php: Remove if file_exists for /etc/wikimedia-scaler (duration: 00m 51s) [16:10:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:21:08] (03CR) 10Jforrester: [C: 031] "This should be fine." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436988 (https://phabricator.wikimedia.org/T196225) (owner: 10Urbanecm) [16:22:12] (03PS1) 10Rduran: [WIP] Add unit tests for transfer.py and CumminExecution [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/437503 [16:29:23] 10Operations, 10ops-codfw, 10fundraising-tech-ops: frdb2001 RAID disk failure - https://phabricator.wikimedia.org/T196251#4258094 (10Papaul) a:05Papaul>03Jgreen @Jgreen disk replacement complete [16:40:45] 10Operations, 10ops-codfw: rack/setup/install backup2001 - https://phabricator.wikimedia.org/T196477#4258119 (10RobH) p:05Triage>03Normal [16:42:18] 10Operations, 10ops-eqiad: rack/setup/install backup1001 - https://phabricator.wikimedia.org/T196478#4258137 (10RobH) p:05Triage>03Normal [16:53:18] (03PS1) 10Thcipriani: Group0 to 1.32.0-wmf.7 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437508 (https://phabricator.wikimedia.org/T191053) [16:55:26] Reedy, https://gerrit.wikimedia.org/r/#/c/437502/ [16:56:15] !log upgrade Cassandra to 3.11.2, restbase-dev1005 - T178905 [16:56:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:56:21] T178905: Upgrade RESTBase cluster to Cassandra release: 3.11.2 - https://phabricator.wikimedia.org/T178905 [16:58:27] !log reindexing Slovak wikis on elastic@eqiad (T191545) [16:58:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:58:32] T191545: Re-index Slovak Wikis after analysis chain is deployed - https://phabricator.wikimedia.org/T191545 [16:58:51] (03CR) 10Ema: VCL: Normalise the Accept-Language header for the REST API (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/434558 (https://phabricator.wikimedia.org/T195327) (owner: 10Mobrovac) [16:58:55] (03CR) 10Reedy: [C: 04-1] "Minor issue, unrelated to the change, but related to the value being set as a default" (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437502 (https://phabricator.wikimedia.org/T191888) (owner: 10Pmiazga) [17:00:05] cscott, arlolra, subbu, halfak, and Amir1: (Dis)respected human, time to deploy Services – Graphoid / Parsoid / Citoid / ORES (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180605T1700). Please do the needful. [17:00:53] (03PS2) 10Pmiazga: beta: Enable PP for newly created accounts [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437502 (https://phabricator.wikimedia.org/T191888) [17:00:56] !log upgrade Cassandra to 3.11.2, restbase-dev1006 - T178905 [17:01:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:02:03] no parsoid deploy today [17:04:28] (03CR) 10Pmiazga: "thanks for catching this. PHP will cast that value to string anyway so it shouldn't have any effect. The config change is a string as prev" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437502 (https://phabricator.wikimedia.org/T191888) (owner: 10Pmiazga) [17:14:30] 10Operations, 10ops-eqiad, 10DBA: Degraded RAID on labsdb1009 - https://phabricator.wikimedia.org/T195690#4258274 (10Cmjohnson) The report was sent to HP yesterday, i have not heard back from them yet. If I don't get something in the next few hours I will ping them. [17:17:04] 10Operations, 10ops-eqiad, 10DBA: Degraded RAID on labsdb1009 - https://phabricator.wikimedia.org/T195690#4258276 (10Marostegui) Thanks for the update! [17:18:13] 10Operations, 10ops-eqiad: Degraded RAID on ms-be1034 - https://phabricator.wikimedia.org/T195569#4258278 (10Cmjohnson) @volans I have a new disk for ms-be1034. Let me know if you want to replace it? Confirm it's disk1 please. Thanks [17:18:41] (03PS4) 10Gehel: elasticsearch: alert when cirrus writes are frozen for too long [puppet] - 10https://gerrit.wikimedia.org/r/431754 (https://phabricator.wikimedia.org/T193605) [17:22:28] (03CR) 10Reedy: [C: 031] "LGTM now" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437502 (https://phabricator.wikimedia.org/T191888) (owner: 10Pmiazga) [17:30:07] 10Operations, 10ops-codfw: rack/setup/install graphite2003 - https://phabricator.wikimedia.org/T196483#4258321 (10RobH) p:05Triage>03Normal [17:30:11] 10Operations, 10ops-eqiad: rack/setup/install graphite1004 - https://phabricator.wikimedia.org/T196484#4258336 (10RobH) p:05Triage>03Normal [17:31:16] 10Operations, 10Discovery, 10Wikidata, 10Wikidata-Query-Service: WDQS diskspace is low - https://phabricator.wikimedia.org/T196485#4258356 (10Smalyshev) [17:31:20] 10Operations, 10ops-eqiad, 10Cloud-Services, 10Patch-For-Review: Connect or troubleshoot eth1 on labvirt1019 and labvirt1020 - https://phabricator.wikimedia.org/T194964#4258369 (10Cmjohnson) 05Open>03Resolved Resolving this, please open again if you're still having issues. [17:31:44] 10Operations, 10Discovery, 10Wikidata, 10Wikidata-Query-Service: WDQS diskspace is low - https://phabricator.wikimedia.org/T196485#4258371 (10Smalyshev) p:05Triage>03High [17:32:26] 10Operations, 10Discovery, 10Wikidata, 10Wikidata-Query-Service: WDQS diskspace is low - https://phabricator.wikimedia.org/T196485#4258384 (10Gehel) We have a "sleeping" task to order new disks: T186526 [17:34:17] PROBLEM - Host labnet1004.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [17:34:33] 10Operations, 10Discovery, 10Wikidata, 10Wikidata-Query-Service: WDQS diskspace is low - https://phabricator.wikimedia.org/T196485#4258389 (10Smalyshev) [17:34:58] PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1966 bytes in 0.117 second response time [17:37:09] 10Operations, 10ops-eqiad, 10netops: upgrade row d to have 3 10G switches - https://phabricator.wikimedia.org/T196487#4258394 (10RobH) p:05Triage>03Normal [17:39:37] RECOVERY - Host labnet1004.mgmt is UP: PING OK - Packet loss = 0%, RTA = 1.11 ms [17:40:43] 10Operations, 10ops-eqiad: Degraded RAID on ms-be1034 - https://phabricator.wikimedia.org/T195569#4258424 (10Volans) @Cmjohnson which disk is a tricky question in this case. From @akosiaris SAL it seemed it was `0:1:0:1` that was corresponding to `sdb` at the best of my knowledge from the `kern.log` of the la... [17:44:58] 10Operations, 10ops-codfw, 10fundraising-tech-ops: rack/setup/install Prometeuse/Grafana host for fr-tech - https://phabricator.wikimedia.org/T196476#4258430 (10Papaul) [17:45:08] !log upgrade Cassandra to 3.11.2, restbase2001 (canary) - T178905 [17:45:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:45:12] T178905: Upgrade RESTBase cluster to Cassandra release: 3.11.2 - https://phabricator.wikimedia.org/T178905 [17:45:54] (03PS4) 10Mobrovac: VCL: Normalise the Accept-Language header for the REST API [puppet] - 10https://gerrit.wikimedia.org/r/434558 (https://phabricator.wikimedia.org/T195327) [17:46:19] 10Operations, 10ops-codfw, 10netops: upgrade all codfw switch stacks to include additional 10G - https://phabricator.wikimedia.org/T196489#4258445 (10RobH) p:05Triage>03Normal [17:46:33] 10Operations, 10ops-codfw, 10netops: upgrade all codfw switch stacks to include additional 10G switch per row - https://phabricator.wikimedia.org/T196489#4258445 (10RobH) [17:46:35] 10Operations, 10ops-eqiad: Degraded RAID on db1065 - https://phabricator.wikimedia.org/T196490#4258459 (10ops-monitoring-bot) [17:47:11] 10Operations, 10ops-codfw, 10fundraising-tech-ops: rack/setup/install Prometeuse/Grafana host for fr-tech - https://phabricator.wikimedia.org/T196476#4258463 (10Jgreen) [17:47:27] (03CR) 10Mobrovac: VCL: Normalise the Accept-Language header for the REST API (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/434558 (https://phabricator.wikimedia.org/T195327) (owner: 10Mobrovac) [17:48:37] 10Operations, 10ops-codfw, 10fundraising-tech-ops: rack/setup/install Prometeuse/Grafana host frmon2001 for fr-tech - https://phabricator.wikimedia.org/T196476#4258468 (10Papaul) [17:49:00] 10Operations, 10ops-codfw, 10fundraising-tech-ops: Rack/Setup frbast2001.frack.codfw.wmnet - https://phabricator.wikimedia.org/T196417#4258469 (10RobH) [17:49:17] 10Operations, 10ops-codfw, 10fundraising-tech-ops: rack/setup/install Prometeuse/Grafana host frmon2001 for fr-tech - https://phabricator.wikimedia.org/T196476#4257945 (10Papaul) [17:49:28] PROBLEM - cassandra-a CQL 10.192.16.162:9042 on restbase2001 is CRITICAL: connect to address 10.192.16.162 and port 9042: Connection refused [17:49:37] PROBLEM - cassandra-a SSL 10.192.16.162:7001 on restbase2001 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused [17:49:51] ^^^ that's me [17:50:00] * urandom should have put that under maintenance [17:50:12] 10Operations, 10ops-codfw, 10fundraising-tech-ops: Rack/Setup frbast2001.frack.codfw.wmnet - https://phabricator.wikimedia.org/T196417#4258476 (10Papaul) [17:50:22] 10Operations, 10ops-codfw, 10fundraising-tech-ops: rack/setup/install Prometeuse/Grafana host frmon2001 for fr-tech - https://phabricator.wikimedia.org/T196476#4258478 (10Jgreen) [17:52:47] RECOVERY - cassandra-a CQL 10.192.16.162:9042 on restbase2001 is OK: TCP OK - 0.036 second response time on 10.192.16.162 port 9042 [17:52:57] RECOVERY - cassandra-a SSL 10.192.16.162:7001 on restbase2001 is OK: SSL OK - Certificate restbase2001-a valid until 2018-08-17 16:11:39 +0000 (expires in 72 days) [17:53:07] 10Operations, 10ops-eqiad, 10DBA: Degraded RAID on db1065 - https://phabricator.wikimedia.org/T196490#4258498 (10Marostegui) This disk has been replaced by @Cmjohnson as we were coming from: T195444#4230827 [17:53:38] 10Operations, 10ops-eqiad, 10DBA: Degraded RAID on db1065 - https://phabricator.wikimedia.org/T196490#4258505 (10Marostegui) p:05Triage>03Normal [17:55:45] 10Operations, 10ops-eqiad, 10DBA: Degraded RAID on db1065 - https://phabricator.wikimedia.org/T196490#4258525 (10Marostegui) ``` root@db1065:~# megacli -PDRbld -ShowProg -PhysDrv [32:1] -aALL Rebuild Progress on Device at Enclosure 32, Slot 1 Completed 56% in 31 Minutes. ``` [18:00:05] Deploy window Pre MediaWiki train sanity break (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180605T1800) [18:00:11] 10Operations, 10ops-codfw, 10DNS, 10Traffic: rack/setup/install dns200[12].wikimedia.org - https://phabricator.wikimedia.org/T196493#4258543 (10RobH) p:05Triage>03Normal [18:03:43] (03PS5) 10Krinkle: profiler-labs: Use FlameGraph-compatible format for xhprof sampler [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434522 (https://phabricator.wikimedia.org/T176916) [18:04:38] 10Operations, 10ops-eqiad, 10Cloud-VPS: labnet1003 and labnet1004 moving and enabling 10G NICs - https://phabricator.wikimedia.org/T193196#4258579 (10Cmjohnson) labnet1003 is already in B7 and the 10G ports are connected to the new switch and the descriptions were updated @ayounsi will need to configure the... [18:06:20] 10Operations, 10ops-codfw, 10DBA: Degraded RAID on db2047 - https://phabricator.wikimedia.org/T196246#4258595 (10Marostegui) 05Open>03Resolved All good! ``` logicaldrive 1 (3.3 TB, RAID 1+0, OK) physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 600 GB, OK) physicaldrive 1I:1:2 (port 1I:b... [18:06:47] 10Operations, 10Continuous-Integration-Infrastructure, 10SRE-Access-Requests, 10Patch-For-Review: Add Reedy to contint-docker group - https://phabricator.wikimedia.org/T196192#4249720 (10RobH) If no objections are noted, I'll be merging this live tomorrow, Wednesday, 2018-06-06. [18:06:58] (03PS6) 10Krinkle: profiler-labs: Use FlameGraph-compatible format for xhprof sampler [mediawiki-config] - 10https://gerrit.wikimedia.org/r/434522 (https://phabricator.wikimedia.org/T176916) [18:07:45] * Krinkle staging on mwdebug1001 [18:07:50] s/staging/testing/ [18:08:53] 10Operations, 10ops-eqiad, 10cloud-services-team: Degraded RAID on labvirt1019 - https://phabricator.wikimedia.org/T194907#4258601 (10Cmjohnson) 05Open>03Resolved a:03Cmjohnson The battery was wrong and has been replaced. [18:10:40] 10Operations, 10Continuous-Integration-Infrastructure, 10SRE-Access-Requests, 10Patch-For-Review: Add Reedy to contint-docker group - https://phabricator.wikimedia.org/T196192#4249720 (10hashar) +1 [18:12:37] RECOVERY - Host labvirt1019 is UP: PING OK - Packet loss = 0%, RTA = 0.25 ms [18:12:47] RECOVERY - HP RAID on db2047 is OK: OK: Slot 0: OK: 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:9, 1I:1:10, 1I:1:11, 1I:1:12 - Controller: OK - Battery/Capacitor: OK [18:13:30] 10Operations, 10Cloud-Services, 10Cloud-VPS, 10IPv6: Enable ipv6 on labs - https://phabricator.wikimedia.org/T37947#4258615 (10stwalkerster) >>! In T37947#2449718, @FastLizard4 wrote: > Bumping this task. I hate to bump this again, but is there any plan for this to happen in the near/mid/distant future? I... [18:13:39] PROBLEM - ensure kvm processes are running on labvirt1019 is CRITICAL: PROCS CRITICAL: 0 processes with regex args qemu-system-x86_64 [18:13:46] (03CR) 10jenkins-bot: Remove useless file_exists() call [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437481 (owner: 10Giuseppe Lavagetto) [18:14:14] 10Operations, 10DNS, 10Release-Engineering-Team, 10Traffic, and 2 others: Move Foundation Wiki to new URL when new Wikimedia Foundation website launches - https://phabricator.wikimedia.org/T188776#4258618 (10Aklapper) @Liuxinyu970226 : How is that related to the topic of this task...? [18:15:32] 10Operations, 10ops-eqiad: Degraded RAID on labvirt1020 - https://phabricator.wikimedia.org/T194855#4210955 (10Cmjohnson) I need a SSD report HP. Could you please install hpssaducli and run the report and email me the zip file. Kindly provide us the Smart Wear Gauge report so that we can check if the drive h... [18:16:03] 10Operations, 10DNS, 10Release-Engineering-Team, 10Traffic, and 2 others: Move Foundation Wiki to new URL when new Wikimedia Foundation website launches - https://phabricator.wikimedia.org/T188776#4258622 (10greg) @Varnent current timeline for this? I'm trying to determine who can do it. [18:17:59] RECOVERY - ensure kvm processes are running on labvirt1019 is OK: PROCS OK: 1 process with regex args qemu-system-x86_64 [18:22:29] 10Operations, 10ops-eqiad, 10Cloud-Services, 10Patch-For-Review: Connect or troubleshoot eth1 on labvirt1019 and labvirt1020 - https://phabricator.wikimedia.org/T194964#4258645 (10Bstorm) 05Resolved>03Open labvirt1019 still has a dead eth1. ``` root@labvirt1019:~# ip link show 1: lo: PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 224, down: 1, dormant: 0, excluded: 0, unused: 0 [18:27:27] PROBLEM - Host mw1280.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [18:31:18] 10Puppet, 10Beta-Cluster-reproducible, 10Patch-For-Review: puppet failures due to "Could not find class" or "Puppet::Parser::AST::Resource failed with error ArgumentError: Invalid resource type" - https://phabricator.wikimedia.org/T131946#4258700 (10Krenair) Has anyone seen this problem recently? [18:32:47] 10Operations, 10ops-eqiad: mw1280: CPU error - https://phabricator.wikimedia.org/T195734#4258708 (10Cmjohnson) Swapped the CPU's today to see if error follows [18:32:57] RECOVERY - Host mw1280.mgmt is UP: PING WARNING - Packet loss = 66%, RTA = 1.60 ms [18:34:17] RECOVERY - Host mw1280 is UP: PING OK - Packet loss = 0%, RTA = 0.22 ms [18:34:57] 10Operations, 10ops-eqiad: rack/setup/install backup1001 - https://phabricator.wikimedia.org/T196478#4258714 (10Cmjohnson) [18:36:08] RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1948 bytes in 0.105 second response time [18:36:32] 10Operations, 10ops-eqiad, 10Cloud-Services, 10Patch-For-Review: Connect or troubleshoot eth1 on labvirt1019 and labvirt1020 - https://phabricator.wikimedia.org/T194964#4258730 (10Cmjohnson) @bstorm oh..sorry I think i just ran labvirt1019 back through the installer [18:44:56] 10Operations, 10ops-codfw, 10fundraising-tech-ops: rack/setup/install Prometeuse/Grafana host frmon2001 for fr-tech - https://phabricator.wikimedia.org/T196476#4258776 (10Papaul) [18:45:02] 10Operations, 10ops-codfw, 10fundraising-tech-ops: Rack/Setup frbast2001.frack.codfw.wmnet - https://phabricator.wikimedia.org/T196417#4258781 (10Papaul) [18:53:37] PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1971 bytes in 0.095 second response time [18:58:46] RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1958 bytes in 0.062 second response time [18:58:56] 10Operations, 10ops-eqiad, 10DBA: Degraded RAID on labsdb1009 - https://phabricator.wikimedia.org/T195690#4258824 (10Cmjohnson) This is Regarding the Case Number 5329764075 for HPE ProLiant DL380 Gen9 8SFF Configure-to-order Server Issue: SCM_HW:Failed Hard Drive Thanks a lot for sharing the Smart Wear G... [19:00:05] thcipriani: That opportune time is upon us again. Time for a MediaWiki train deploy. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180605T1900). [19:03:26] PROBLEM - mediawiki-installation DSH group on mw1280 is CRITICAL: Host mw1280 is not in mediawiki-installation dsh group [19:08:56] PROBLEM - restbase endpoints health on restbase-dev1004 is CRITICAL: /en.wikipedia.org/v1/feed/onthisday/{type}/{mm}/{dd} (Retrieve selected the events for Jan 01) timed out before a response was received: /en.wikipedia.org/v1/transform/wikitext/to/html{/title}{/revision} (Transform wikitext to html) is WARNING: Test Transform wikitext to html responds with unexpected body: h2 id=HeadingHeading/h2 != /^h2.* Heading \/h2/ [19:25:32] 10Operations, 10Traffic, 10User-Johan: Provide a multi-language user-faced warning regarding AES128-SHA deprecation - https://phabricator.wikimedia.org/T196371#4258918 (10Johan) So we plan to do this gradually, like we did with the 3DES deprecation, right? [19:26:07] PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1965 bytes in 0.072 second response time [19:29:48] 10Operations, 10Traffic, 10User-Johan: Provide a multi-language user-faced warning regarding AES128-SHA deprecation - https://phabricator.wikimedia.org/T196371#4258925 (10BBlack) Yes, but on a shorter timescale. The 3DES one, in retrospect, dragged on a bit longer than it had to, and in this case (a) We're... [19:35:21] !log upgrade Cassandra to 3.11.2, restbase1007-a (canary) - T178905 [19:35:23] thcipriani: do i have time to sneak in a quick mobileapps deployment before the train rolls? (shouldn't matter, just nice to keep one thing happening at a time, i suppose) [19:35:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:35:26] T178905: Upgrade RESTBase cluster to Cassandra release: 3.11.2 - https://phabricator.wikimedia.org/T178905 [19:35:27] 10Operations, 10ops-codfw, 10fundraising-tech-ops: Rack/Setup frbast2001.frack.codfw.wmnet - https://phabricator.wikimedia.org/T196417#4258928 (10Papaul) a:03Papaul [19:35:36] mdholloway: sure thing, go for it [19:35:42] thcipriani: sweet, thanks [19:35:52] 10Operations, 10ops-codfw, 10fundraising-tech-ops: Rack/Setup frbast2001.frack.codfw.wmnet - https://phabricator.wikimedia.org/T196417#4255460 (10Papaul) [19:37:57] (03PS1) 10Papaul: DNS: Add mgmt DNS entries for frbast2001 [dns] - 10https://gerrit.wikimedia.org/r/437537 (https://phabricator.wikimedia.org/T196417) [19:41:27] RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1961 bytes in 0.069 second response time [19:41:56] PROBLEM - restbase endpoints health on restbase-dev1004 is CRITICAL: /en.wikipedia.org/v1/feed/onthisday/{type}/{mm}/{dd} (Retrieve selected the events for Jan 01) timed out before a response was received: /en.wikipedia.org/v1/transform/wikitext/to/html{/title}{/revision} (Transform wikitext to html) is WARNING: Test Transform wikitext to html responds with unexpected body: h2 id=HeadingHeading/h2 != /^h2.* Heading \/h2/ [19:42:11] (03CR) 10Jgreen: [C: 032] DNS: Add mgmt DNS entries for frbast2001 [dns] - 10https://gerrit.wikimedia.org/r/437537 (https://phabricator.wikimedia.org/T196417) (owner: 10Papaul) [19:43:44] (03PS1) 10Papaul: DNS: Add prod DNS entries for frbast2001 [dns] - 10https://gerrit.wikimedia.org/r/437539 (https://phabricator.wikimedia.org/T196417) [19:45:15] 10Operations, 10Patch-For-Review, 10Prometheus-metrics-monitoring, 10User-fgiunchedi: Port redis statistics to Prometheus - https://phabricator.wikimedia.org/T148637#2728497 (10Krenair) It has come up while reviewing cherry-picks used in deployment-prep that https://gerrit.wikimedia.org/r/386869 was not me... [19:45:52] 10Operations, 10Mail, 10Patch-For-Review: Upgrade mx1001/mx2001 to stretch - https://phabricator.wikimedia.org/T175361#4258975 (10herron) Ok, we should be in good shape to depool mx1001, relay any deferred messages to mx2001, and rebuild mx1001 with stretch. @ayounsi is there a morning (PDT) that works for... [19:46:45] 10Operations, 10ops-codfw, 10fundraising-tech-ops, 10Patch-For-Review: Rack/Setup frbast2001.frack.codfw.wmnet - https://phabricator.wikimedia.org/T196417#4258983 (10Papaul) [19:49:13] 10Operations, 10Traffic, 10User-Johan: Provide a multi-language user-faced warning regarding AES128-SHA deprecation - https://phabricator.wikimedia.org/T196371#4258991 (10Johan) OK. So I'll ask for //Wikipedia is making the site more secure. You are using an old web browser that won't be able to connect to... [19:53:05] 10Operations, 10ops-codfw, 10fundraising-tech-ops, 10netops: switch port configuration for frbackup2001 - https://phabricator.wikimedia.org/T196503#4258998 (10Papaul) [19:53:25] 10Operations, 10ops-codfw, 10fundraising-tech-ops, 10netops: switch port configuration for frbackup2001 - https://phabricator.wikimedia.org/T196503#4259010 (10Papaul) [19:57:32] 10Operations, 10ops-codfw, 10fundraising-tech-ops, 10Patch-For-Review: Rack/Setup frbast2001.frack.codfw.wmnet - https://phabricator.wikimedia.org/T196417#4259017 (10Papaul) a:05Papaul>03Jgreen @Jgreen this is done at my end. Once the switch port configuration is done it is all yours. Thanks [19:57:50] !log mholloway-shell@deploy1001 Started deploy [mobileapps/deploy@7ecc3b6]: Update mobileapps to 66727b7 [19:57:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:03:11] !log mholloway-shell@deploy1001 Finished deploy [mobileapps/deploy@7ecc3b6]: Update mobileapps to 66727b7 (duration: 05m 21s) [20:03:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:09:59] (03PS1) 10Dzahn: dumps: add phab1002 as second phab server [puppet] - 10https://gerrit.wikimedia.org/r/437558 (https://phabricator.wikimedia.org/T196019) [20:11:47] PROBLEM - restbase endpoints health on restbase-dev1004 is CRITICAL: /en.wikipedia.org/v1/feed/onthisday/{type}/{mm}/{dd} (Retrieve selected the events for Jan 01) timed out before a response was received: /en.wikipedia.org/v1/transform/wikitext/to/html{/title}{/revision} (Transform wikitext to html) is WARNING: Test Transform wikitext to html responds with unexpected body: h2 id=HeadingHeading/h2 != /^h2.* Heading \/h2/ [20:17:08] ACKNOWLEDGEMENT - HP RAID on labvirt1019 is CRITICAL: CRITICAL: Slot 0: no logical drives --- Slot 0: no drives --- Slot 1: OK: 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 2I:1:1, 2I:1:2, 2I:1:3, 2I:1:4, 2I:2:1, 2I:2:2 - Controller: OK - Battery count: 0 nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T196507 [20:17:14] 10Operations, 10ops-eqiad: Degraded RAID on labvirt1019 - https://phabricator.wikimedia.org/T196507#4259081 (10ops-monitoring-bot) [20:19:18] 10Operations, 10ops-eqiad, 10Patch-For-Review: setup/install phab1002(WMF4727) - https://phabricator.wikimedia.org/T196019#4244341 (10ArielGlenn) profile::dumps::distribution::datasets::fetcher is what you want to fix. I need to go through and see which rsync confs I can kill. Later. [20:21:29] (03PS1) 10Dzahn: mariadb: add phab1002 to phabricator grants [puppet] - 10https://gerrit.wikimedia.org/r/437613 (https://phabricator.wikimedia.org/T196019) [20:21:48] (03PS1) 10Ottomata: Kafka - Change default message.timestamp.type to CreateTime [puppet] - 10https://gerrit.wikimedia.org/r/437614 (https://phabricator.wikimedia.org/T196407) [20:23:46] !log thcipriani@deploy1001 Started scap: testwiki to php-1.32.0-wmf.7 and rebuild l10n cache [20:23:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:24:16] (03PS1) 10Dzahn: hiera/phabricator: add phab1002 as phab server [puppet] - 10https://gerrit.wikimedia.org/r/437615 (https://phabricator.wikimedia.org/T196019) [20:24:30] (03CR) 10Ppchelko: [C: 031] Kafka - Change default message.timestamp.type to CreateTime [puppet] - 10https://gerrit.wikimedia.org/r/437614 (https://phabricator.wikimedia.org/T196407) (owner: 10Ottomata) [20:35:44] (03PS2) 10Dzahn: dumps: add phab1002 as second phab server [puppet] - 10https://gerrit.wikimedia.org/r/437558 (https://phabricator.wikimedia.org/T196019) [20:42:53] (03PS1) 10Dzahn: switch phabricator from phab1001 to phab1002 [puppet] - 10https://gerrit.wikimedia.org/r/437620 (https://phabricator.wikimedia.org/T196019) [20:47:31] (03PS3) 10Dzahn: dumps: add phab1002 as second phab server [puppet] - 10https://gerrit.wikimedia.org/r/437558 (https://phabricator.wikimedia.org/T196019) [20:49:47] (03CR) 10Herron: [C: 04-1] "Thanks for this! Certainly a good thing to be doing in concept." [puppet] - 10https://gerrit.wikimedia.org/r/437057 (https://phabricator.wikimedia.org/T194962) (owner: 10Alex Monk) [20:52:49] (03PS1) 10Dzahn: mtail: replace phab1001 with phab1002? [puppet] - 10https://gerrit.wikimedia.org/r/437625 (https://phabricator.wikimedia.org/T196019) [21:07:39] (03CR) 10Krinkle: [C: 031] Change mode of IS.php and a few of other files to 644 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436988 (https://phabricator.wikimedia.org/T196225) (owner: 10Urbanecm) [21:29:11] !log thcipriani@deploy1001 Finished scap: testwiki to php-1.32.0-wmf.7 and rebuild l10n cache (duration: 65m 25s) [21:29:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:36:56] (03CR) 10Thcipriani: [C: 032] Group0 to 1.32.0-wmf.7 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437508 (https://phabricator.wikimedia.org/T191053) (owner: 10Thcipriani) [21:38:16] (03Merged) 10jenkins-bot: Group0 to 1.32.0-wmf.7 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437508 (https://phabricator.wikimedia.org/T191053) (owner: 10Thcipriani) [21:39:11] (03CR) 10jenkins-bot: Group0 to 1.32.0-wmf.7 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437508 (https://phabricator.wikimedia.org/T191053) (owner: 10Thcipriani) [21:41:23] !log thcipriani@deploy1001 rebuilt and synchronized wikiversions files: group0 to 1.32.0-wmf.7 [21:41:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:53:59] 10Operations, 10Traffic, 10User-Johan: Provide a multi-language user-faced warning regarding AES128-SHA deprecation - https://phabricator.wikimedia.org/T196371#4253860 (10Platonides) The second instance of Wikipedia should be replaced with the project name in each case. I don't like the explanation "Wikipedi... [22:00:04] dmaza and MaxSem: (Dis)respected human, time to deploy Debug anon cookie blocks (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180605T2200). Please do the needful. [22:00:53] lol, I didn't know about this [22:11:03] 10Operations, 10Dumps-Generation, 10Wikimedia-log-errors: High rate of "Memcached error .. CONNECTION FAILURE" on snapshot hosts - https://phabricator.wikimedia.org/T196303#4259403 (10aaron) >>! In T196303#4257638, @ArielGlenn wrote: > From T196125: get* and getMulti* commands now take the Memcached::GET_EXT... [22:15:26] (03PS1) 10MaxSem: Reenable $wgCookieSetOnIpBlock on itwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437638 [22:15:35] (03PS2) 10MaxSem: Reenable $wgCookieSetOnIpBlock on itwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437638 [22:17:46] (03CR) 10MaxSem: [C: 032] Reenable $wgCookieSetOnIpBlock on itwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437638 (owner: 10MaxSem) [22:19:00] (03Merged) 10jenkins-bot: Reenable $wgCookieSetOnIpBlock on itwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437638 (owner: 10MaxSem) [22:19:25] (03CR) 10jenkins-bot: Reenable $wgCookieSetOnIpBlock on itwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437638 (owner: 10MaxSem) [22:20:49] !log maxsem@deploy1001 Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/437638/ (duration: 00m 57s) [22:20:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:26:17] PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1947 bytes in 0.063 second response time [22:29:17] 10Operations, 10Traffic, 10User-Johan: Provide a multi-language user-faced warning regarding AES128-SHA deprecation - https://phabricator.wikimedia.org/T196371#4259418 (10Johan) The first sentences should be really simple – they're mainly aimed at people who use old, outdated browsers, who might not get a tr... [22:32:24] (03PS2) 10Alex Monk: Prepare to tighten Puppet DB access control - check client certificates [puppet] - 10https://gerrit.wikimedia.org/r/437057 (https://phabricator.wikimedia.org/T194962) [22:32:26] (03PS1) 10Alex Monk: Tighten Puppet DB access control - check client certificates [puppet] - 10https://gerrit.wikimedia.org/r/437640 (https://phabricator.wikimedia.org/T194962) [22:33:18] (03CR) 10jerkins-bot: [V: 04-1] Tighten Puppet DB access control - check client certificates [puppet] - 10https://gerrit.wikimedia.org/r/437640 (https://phabricator.wikimedia.org/T194962) (owner: 10Alex Monk) [22:36:10] (03CR) 10Alex Monk: "Good point. PS2 turns this changeset into the requesting-certs-and-logging part and there's a new child changeset that can handle actual a" [puppet] - 10https://gerrit.wikimedia.org/r/437057 (https://phabricator.wikimedia.org/T194962) (owner: 10Alex Monk) [22:41:37] RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1972 bytes in 0.075 second response time [22:44:58] (03PS3) 10Reedy: Collapse PHP_SAPI conditionals down into one [mediawiki-config] - 10https://gerrit.wikimedia.org/r/393355 [22:46:02] (03CR) 10Reedy: [C: 032] Collapse PHP_SAPI conditionals down into one [mediawiki-config] - 10https://gerrit.wikimedia.org/r/393355 (owner: 10Reedy) [22:47:17] (03Merged) 10jenkins-bot: Collapse PHP_SAPI conditionals down into one [mediawiki-config] - 10https://gerrit.wikimedia.org/r/393355 (owner: 10Reedy) [22:49:17] (03CR) 10jenkins-bot: Collapse PHP_SAPI conditionals down into one [mediawiki-config] - 10https://gerrit.wikimedia.org/r/393355 (owner: 10Reedy) [22:49:20] (03CR) 10Jforrester: "Nice." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/393355 (owner: 10Reedy) [22:50:48] !log reedy@deploy1001 Synchronized wmf-config/CommonSettings.php: Simplify PHP_SAPI conditionals (duration: 00m 57s) [22:50:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:51:56] (03PS3) 10Reedy: Change mode of IS.php and a few of other files to 644 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436988 (https://phabricator.wikimedia.org/T196225) (owner: 10Urbanecm) [22:51:59] (03CR) 10Reedy: [C: 032] Change mode of IS.php and a few of other files to 644 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436988 (https://phabricator.wikimedia.org/T196225) (owner: 10Urbanecm) [22:52:33] Reedy: https://gerrit.wikimedia.org/r/#/c/436994/ too? [22:53:52] (03Merged) 10jenkins-bot: Change mode of IS.php and a few of other files to 644 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436988 (https://phabricator.wikimedia.org/T196225) (owner: 10Urbanecm) [22:54:18] Needs a rebase :P [22:54:26] (03PS3) 10Reedy: beta: Enable PP for newly created accounts [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437502 (https://phabricator.wikimedia.org/T191888) (owner: 10Pmiazga) [22:54:35] (03CR) 10Reedy: [C: 032] beta: Enable PP for newly created accounts [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437502 (https://phabricator.wikimedia.org/T191888) (owner: 10Pmiazga) [22:54:45] (03CR) 10jenkins-bot: Change mode of IS.php and a few of other files to 644 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436988 (https://phabricator.wikimedia.org/T196225) (owner: 10Urbanecm) [22:55:13] 10Operations, 10Traffic, 10User-Johan: Provide a multi-language user-faced warning regarding AES128-SHA deprecation - https://phabricator.wikimedia.org/T196371#4259443 (10Johan) Will this message be the same for the non-Wikipedia wikis? [22:55:54] 10Operations, 10Traffic, 10User-Johan: Provide a multi-language user-faced warning regarding AES128-SHA deprecation - https://phabricator.wikimedia.org/T196371#4253860 (10Reedy) >>! In T196371#4259443, @Johan wrote: > Will this message be the same for the non-Wikipedia wikis? Yeah, all projects [22:56:18] (03Merged) 10jenkins-bot: beta: Enable PP for newly created accounts [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437502 (https://phabricator.wikimedia.org/T191888) (owner: 10Pmiazga) [22:56:30] (03PS4) 10Jforrester: Add Minus-X to check against files that shouldn't be executable [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436994 (https://phabricator.wikimedia.org/T196225) (owner: 10Mainframe98) [22:57:05] (03PS5) 10Reedy: Add Minus-X to check against files that shouldn't be executable [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436994 (https://phabricator.wikimedia.org/T196225) (owner: 10Mainframe98) [22:57:08] Haha [22:57:21] !log temporarily disable ferm on elastic1018 to test theory [22:57:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:57:35] (03CR) 10Reedy: [C: 032] Add Minus-X to check against files that shouldn't be executable [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436994 (https://phabricator.wikimedia.org/T196225) (owner: 10Mainframe98) [22:57:50] !log temporarily disable ferm on elastic1018 and logstash1007 to test theory [22:57:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:58:55] (03Merged) 10jenkins-bot: Add Minus-X to check against files that shouldn't be executable [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436994 (https://phabricator.wikimedia.org/T196225) (owner: 10Mainframe98) [22:59:07] (03PS3) 10Reedy: Fixing very trivial spelling error in InitialiseSettings.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437441 (owner: 10Sau226) [22:59:13] (03CR) 10Reedy: [C: 032] Fixing very trivial spelling error in InitialiseSettings.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437441 (owner: 10Sau226) [23:00:04] addshore, hashar, anomie, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: #bothumor My software never has bugs. It just develops random features. Rise for Evening SWAT (Max 6 patches). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180605T2300). [23:00:04] kaldari: A patch you scheduled for Evening SWAT (Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [23:00:16] damn it :D [23:00:17] PROBLEM - Check whether ferm is active by checking the default input chain on elastic1018 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly [23:00:53] ^^ thats me. running a short test on fragmented udp packets [23:00:57] PROBLEM - Check whether ferm is active by checking the default input chain on logstash1007 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly [23:01:00] same [23:01:09] (03Merged) 10jenkins-bot: Fixing very trivial spelling error in InitialiseSettings.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437441 (owner: 10Sau226) [23:01:11] (03CR) 10jenkins-bot: beta: Enable PP for newly created accounts [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437502 (https://phabricator.wikimedia.org/T191888) (owner: 10Pmiazga) [23:01:25] (03PS2) 10Reedy: Drop the UnicodeConverter extension from production, part 1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436331 (https://phabricator.wikimedia.org/T195941) (owner: 10Jforrester) [23:01:26] (03CR) 10Reedy: [C: 032] Drop the UnicodeConverter extension from production, part 1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436331 (https://phabricator.wikimedia.org/T195941) (owner: 10Jforrester) [23:01:45] !log restore ferm on elastic1018 and logstash1009 [23:01:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:02:06] RECOVERY - Check whether ferm is active by checking the default input chain on logstash1007 is OK: OK ferm input default policy is set [23:02:27] RECOVERY - Check whether ferm is active by checking the default input chain on elastic1018 is OK: OK ferm input default policy is set [23:02:45] (03Merged) 10jenkins-bot: Drop the UnicodeConverter extension from production, part 1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436331 (https://phabricator.wikimedia.org/T195941) (owner: 10Jforrester) [23:03:33] here [23:04:46] !log reedy@deploy1001 Synchronized rpc: 644 (duration: 00m 56s) [23:04:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:05:53] !log reedy@deploy1001 Synchronized composer.json: minus x! (duration: 00m 56s) [23:05:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:07:02] !log reedy@deploy1001 Synchronized wmf-config/: Updates (duration: 00m 58s) [23:07:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:07:37] (03PS2) 10Reedy: Testing page creation log on Beta Labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437379 (https://phabricator.wikimedia.org/T196400) (owner: 10Kaldari) [23:08:48] (03CR) 10Reedy: [C: 032] Testing page creation log on Beta Labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437379 (https://phabricator.wikimedia.org/T196400) (owner: 10Kaldari) [23:12:27] (03CR) 10Reedy: "Did this and part 4 really need splitting? :P" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436333 (https://phabricator.wikimedia.org/T195941) (owner: 10Jforrester) [23:12:48] 10Operations, 10Traffic, 10User-Johan: Provide a multi-language user-faced warning regarding AES128-SHA deprecation - https://phabricator.wikimedia.org/T196371#4259493 (10BBlack) We faced this issue last time and went with "Wikipedia" as well, because: * Explaining "Wikimedia" and the projects quickly in a... [23:14:00] Reedy: Blame greg-g for the new rule that all changes to config have to be single-file changes, so yes, per policy it now has to be four different patches. [23:14:02] * Reedy kicks jerkins [23:14:09] James_F: This seems OTT [23:14:14] Especially as those two aren't used by MW itself [23:14:47] Reedy: They're different files… [23:15:38] Of course, the plus side of being done this way is they can go out weeks apart without anything breaking. [23:15:45] 10Operations, 10Traffic, 10User-Johan: Provide a multi-language user-faced warning regarding AES128-SHA deprecation - https://phabricator.wikimedia.org/T196371#4259512 (10Reedy) >>! In T196371#4259493, @BBlack wrote: > * Doing i18n translations of all the project names and templating them in dynamically is h... [23:19:19] (03Merged) 10jenkins-bot: Testing page creation log on Beta Labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437379 (https://phabricator.wikimedia.org/T196400) (owner: 10Kaldari) [23:20:20] (03PS2) 10Reedy: Drop the UnicodeConverter extension from production, part 2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436332 (https://phabricator.wikimedia.org/T195941) (owner: 10Jforrester) [23:20:24] (03CR) 10Reedy: [C: 032] Drop the UnicodeConverter extension from production, part 2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436332 (https://phabricator.wikimedia.org/T195941) (owner: 10Jforrester) [23:20:28] (03PS2) 10Reedy: Drop the UnicodeConverter extension from production, part 3 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436333 (https://phabricator.wikimedia.org/T195941) (owner: 10Jforrester) [23:20:31] (03CR) 10Reedy: [C: 032] Drop the UnicodeConverter extension from production, part 3 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436333 (https://phabricator.wikimedia.org/T195941) (owner: 10Jforrester) [23:20:34] (03PS2) 10Reedy: Drop the UnicodeConverter extension from production, part 4 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436334 (https://phabricator.wikimedia.org/T195941) (owner: 10Jforrester) [23:20:37] (03CR) 10Reedy: [C: 032] Drop the UnicodeConverter extension from production, part 4 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436334 (https://phabricator.wikimedia.org/T195941) (owner: 10Jforrester) [23:21:06] !log reedy@deploy1001 Synchronized wmf-config/: page creation log on Beta Labs (duration: 00m 56s) [23:21:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:21:54] (03Merged) 10jenkins-bot: Drop the UnicodeConverter extension from production, part 2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436332 (https://phabricator.wikimedia.org/T195941) (owner: 10Jforrester) [23:22:06] (03Merged) 10jenkins-bot: Drop the UnicodeConverter extension from production, part 3 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436333 (https://phabricator.wikimedia.org/T195941) (owner: 10Jforrester) [23:22:17] (03Merged) 10jenkins-bot: Drop the UnicodeConverter extension from production, part 4 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436334 (https://phabricator.wikimedia.org/T195941) (owner: 10Jforrester) [23:22:54] kaldari: deployed (and beta will fix itself) [23:23:02] Thanks! [23:24:37] !log reedy@deploy1001 Synchronized wmf-config/: bye to UnicodeConverter (duration: 00m 57s) [23:24:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:26:49] !log reedy@deploy1001 Synchronized multiversion/submodules.json: minus unicodeconverter (duration: 00m 56s) [23:26:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:27:05] (03PS6) 10Reedy: Disable DisableAccount on wikis where there are no disabled users [mediawiki-config] - 10https://gerrit.wikimedia.org/r/338792 (https://phabricator.wikimedia.org/T106067) [23:27:10] (03CR) 10Reedy: [C: 032] Disable DisableAccount on wikis where there are no disabled users [mediawiki-config] - 10https://gerrit.wikimedia.org/r/338792 (https://phabricator.wikimedia.org/T106067) (owner: 10Reedy) [23:28:37] (03Merged) 10jenkins-bot: Disable DisableAccount on wikis where there are no disabled users [mediawiki-config] - 10https://gerrit.wikimedia.org/r/338792 (https://phabricator.wikimedia.org/T106067) (owner: 10Reedy) [23:28:37] James_F, this is a thing to enforce testing happening on each state the servers might be in? [23:28:52] (03PS4) 10Reedy: Replace wfGetLBFactory [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414310 (owner: 10Umherirrender) [23:28:57] (03CR) 10Reedy: [C: 032] Replace wfGetLBFactory [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414310 (owner: 10Umherirrender) [23:29:24] Krenair: Yeah, making it impossible to get into an invalid/broken state. [23:29:45] Krenair: My preference would be scap working atomically, but it's been six years and we still don't have it so… :-) [23:29:59] (03CR) 10Alex Monk: "Beta has Ruby versions 2.1.5p273, 2.3.3p222 and 1.9.3p484. Does this patch need to remain cherry-picked on the beta puppetmaster?" [puppet] - 10https://gerrit.wikimedia.org/r/336840 (owner: 10Hashar) [23:30:19] James_F, doesn't this rule only work if you also enforce each sync being for a single file, and them being done in the same order as the patches? [23:30:58] Well, indeed. [23:32:02] jerkins is on a go slow [23:32:15] (03Merged) 10jenkins-bot: Replace wfGetLBFactory [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414310 (owner: 10Umherirrender) [23:32:24] (03PS2) 10Reedy: Add wmgBabelCategoryNames to officewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432403 (owner: 10Amire80) [23:32:29] (03CR) 10Reedy: [C: 032] Add wmgBabelCategoryNames to officewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432403 (owner: 10Amire80) [23:33:00] James_F: lol [23:33:28] _joe_: you said there were some mcrouter changes you were going to make? [23:35:16] (03Merged) 10jenkins-bot: Add wmgBabelCategoryNames to officewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432403 (owner: 10Amire80) [23:35:44] (03PS2) 10Reedy: Add reference for itwiki $wgAbuseFilterActions [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420237 (owner: 10Nemo bis) [23:35:49] (03CR) 10Reedy: [C: 032] Add reference for itwiki $wgAbuseFilterActions [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420237 (owner: 10Nemo bis) [23:36:57] (03Merged) 10jenkins-bot: Add reference for itwiki $wgAbuseFilterActions [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420237 (owner: 10Nemo bis) [23:37:19] (03PS3) 10Reedy: Remove lines that are now part of AbuseFilter defaults [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424974 (https://phabricator.wikimedia.org/T178349) (owner: 10Huji) [23:37:22] (03CR) 10Reedy: [C: 032] Remove lines that are now part of AbuseFilter defaults [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424974 (https://phabricator.wikimedia.org/T178349) (owner: 10Huji) [23:38:38] (03Merged) 10jenkins-bot: Remove lines that are now part of AbuseFilter defaults [mediawiki-config] - 10https://gerrit.wikimedia.org/r/424974 (https://phabricator.wikimedia.org/T178349) (owner: 10Huji) [23:39:34] (03CR) 10Reedy: [C: 04-2] "Not ready" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432702 (https://phabricator.wikimedia.org/T161553) (owner: 10Andrew Bogott) [23:39:59] (03CR) 10BryanDavis: [C: 04-1] "We are not quite ready for this to land yet, but getting really really close. :)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432702 (https://phabricator.wikimedia.org/T161553) (owner: 10Andrew Bogott) [23:40:19] Reedy: It's like the good old days! Anything in -config that isn't C-2'ed is fair game for immediate deployment. :-) [23:41:07] There's stuff in there that is definitely fair game [23:41:28] James_F: https://gerrit.wikimedia.org/r/#/c/301129/ ? [23:41:47] !log reedy@deploy1001 Synchronized rpc/: code updates (duration: 00m 56s) [23:41:51] Reedy: Eurgh. No. I need to talk to Olga to make sure she still wants it. [23:41:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:42:03] (03CR) 10Jforrester: [C: 04-2] "Needs product re-sign-off." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301129 (https://phabricator.wikimedia.org/T141349) (owner: 10Jforrester) [23:42:58] !log reedy@deploy1001 Synchronized wmf-config/: Config! (duration: 00m 57s) [23:43:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:43:27] (03Abandoned) 10Reedy: Flow settings: wmg -> wg migration, part 3 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370294 (owner: 10MaxSem) [23:43:57] (03PS2) 10Reedy: Only retain private securepoll data for 60 days after election [mediawiki-config] - 10https://gerrit.wikimedia.org/r/372180 (https://phabricator.wikimedia.org/T173393) (owner: 10Brian Wolff) [23:44:00] (03CR) 10Reedy: [C: 032] Only retain private securepoll data for 60 days after election [mediawiki-config] - 10https://gerrit.wikimedia.org/r/372180 (https://phabricator.wikimedia.org/T173393) (owner: 10Brian Wolff) [23:44:28] (03PS2) 10Aaron Schulz: Add "memcached-mcrouter" to $wgObjectCaches as default for testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436252 [23:44:35] (03PS4) 10Jforrester: Change default gallery mode to 'packed' on the English Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301129 (https://phabricator.wikimedia.org/T141349) [23:44:45] (03CR) 10Jforrester: [C: 04-2] "PS4: Rebase." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301129 (https://phabricator.wikimedia.org/T141349) (owner: 10Jforrester) [23:46:47] (03Merged) 10jenkins-bot: Only retain private securepoll data for 60 days after election [mediawiki-config] - 10https://gerrit.wikimedia.org/r/372180 (https://phabricator.wikimedia.org/T173393) (owner: 10Brian Wolff) [23:47:01] (03PS2) 10Reedy: Enable DynamicPageList extension on bdwikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414109 (https://phabricator.wikimedia.org/T188109) (owner: 10Framawiki) [23:47:04] (03CR) 10Reedy: [C: 032] Enable DynamicPageList extension on bdwikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414109 (https://phabricator.wikimedia.org/T188109) (owner: 10Framawiki) [23:47:38] (03CR) 10jerkins-bot: [V: 04-1] Enable DynamicPageList extension on bdwikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414109 (https://phabricator.wikimedia.org/T188109) (owner: 10Framawiki) [23:47:58] 23:47:35 fork failed - Cannot allocate memory [23:48:13] (03CR) 10Reedy: [C: 032] Enable DynamicPageList extension on bdwikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414109 (https://phabricator.wikimedia.org/T188109) (owner: 10Framawiki) [23:48:20] (03Merged) 10jenkins-bot: Enable DynamicPageList extension on bdwikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/414109 (https://phabricator.wikimedia.org/T188109) (owner: 10Framawiki) [23:51:54] (03PS2) 10Reedy: Remove $wgNamespacesWithSubpages overrides for NS_TEMPLATE [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432587 (https://phabricator.wikimedia.org/T191612) (owner: 10Gergő Tisza) [23:51:58] (03CR) 10Reedy: [C: 032] Remove $wgNamespacesWithSubpages overrides for NS_TEMPLATE [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432587 (https://phabricator.wikimedia.org/T191612) (owner: 10Gergő Tisza) [23:52:19] I think we're going over the 8 patch limit for SWAT ;) [23:52:24] Reedy: OK to do an emergency SWAT? [23:52:26] or whatever the number is now ;) [23:52:33] (6) [23:52:36] James_F: Krinkles patch? [23:52:55] Reedy: Roan's. Even though he's busy working the election, he's also written https://gerrit.wikimedia.org/r/#/c/437648/ [23:53:09] (03Merged) 10jenkins-bot: Remove $wgNamespacesWithSubpages overrides for NS_TEMPLATE [mediawiki-config] - 10https://gerrit.wikimedia.org/r/432587 (https://phabricator.wikimedia.org/T191612) (owner: 10Gergő Tisza) [23:53:21] I think Greg just basically said I need to stop merging random things :P [23:53:29] Bah, mis-paste. I meant https://gerrit.wikimedia.org/r/#/c/437649/ [23:53:51] I can file it as a train blocker if it makes you feel better. :-) [23:53:56] Reedy: nah, you do you, until midnight GMT [23:54:09] heh [23:54:13] it kind of feels like old times [23:54:19] Indeed! [23:54:35] .6 and .7? [23:54:41] Reedy: wanna do train next week? It would be Chad's turn in our normal rotation :P [23:54:43] Reedy: .7 [23:54:46] !log reedy@deploy1001 Synchronized wmf-config/: Config! (duration: 00m 57s) [23:54:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:54:56] Reedy: But it's not yet merged in master… [23:55:00] greg-g: I've been waiting for someone to ask me to do the releases [23:55:08] haha [23:55:10] * James_F grins. [23:55:27] Reedy: well in that case we don't even need a rotation. We can just have reedy ;)