[00:00:40] PROBLEM - SSH on srv276 is CRITICAL: Server answer: [00:00:50] RECOVERY - DPKG on srv262 is OK: All packages OK [00:25:57] Commons is running slowly [00:30:20] RECOVERY - SSH on srv276 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [00:30:20] RECOVERY - DPKG on srv276 is OK: All packages OK [00:32:50] RECOVERY - Disk space on srv276 is OK: DISK OK [00:35:28] Reedy: idk if it's "worse" but we have a new report ^ [00:35:36] Indeed [00:35:56] I'd rather not actively ping a load of ops people on a weekend cause it's a bit slow [00:35:59] For obvious reasons [00:36:27] well do we know quantitatively that it's slow? [00:36:40] http://status.wikimedia.org/ [00:36:43] Computer says no [00:36:44] i haven't really looked at ganglia [00:37:12] status.wm.o ain't a graph :/ [00:37:21] Click into the details [00:37:21] http://status.wikimedia.org/8777/178323/Wiki-platform-[[w:en:Main-Page]]-(s1)---UNCACHED [00:37:56] http://status.wikimedia.org/8777/178335/Wiki-commons-(s4)---UNCACHED [00:38:01] Commons is reportedly faster :p [00:38:18] oh, there are graphs. forgot about those [00:38:20] RECOVERY - Apache HTTP on srv276 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.041 second response time [00:38:30] RECOVERY - RAID on srv276 is OK: OK: no RAID installed [00:38:44] idk, I try to move a file, but it just takes a long time [00:38:59] *to actually execute [00:39:20] techman224: there's a *lot* of moving parts. important to know that it's when you try to move a file... [00:39:55] Hell, if someone said bugzilla was slow [00:40:12] I wouldn't be questioning it [00:40:37] well you wouldn't care so much i bet :P [00:40:52] http://ganglia3.wikimedia.org/?c=Miscellaneous%20pmtpa&h=kaulen.wikimedia.org&m=load_one&r=hour&s=by%20name&hc=4&mc=2 [00:40:56] The host looks overloaded [00:41:17] whoa. that's nickel? [00:41:25] What? [00:41:33] nickel.wm.o [00:41:48] What's nickel? Ganglias hasn't been moved yet AFAIK [00:42:09] well, what you linked to looks a lot different than ganglia.wm.o [00:42:33] ganglia3 has been in since around when asher started [00:42:37] It's much faster now [00:43:02] Spence is busy, nobody cares [00:43:03] huh. idk then [00:48:14] Do the servers auto-fix themselves? [00:48:54] Sometimes [00:48:58] Depends what's wrong [00:49:34] not in the petan/ryan lane sense [00:50:47] but sometimes in the mysteriously no longer broken sense. or there was a traffic spike or something that went away [01:29:49] PROBLEM - Puppet freshness on ms1002 is CRITICAL: Puppet has not run in the last 10 hours [02:05:12] !log LocalisationUpdate completed (1.18) at Sun Jan 8 02:05:11 UTC 2012 [02:05:15] Logged the message, Master [02:37:42] Request: POST http://da.wikipedia.org/w/index.php?title=Bruger:Sir48&action=submit, from 91.198.174.54 via sq61.wikimedia.org (squid/2.7.STABLE9) to () [02:37:42] Error: ERR_CANNOT_FORWARD, errno [No Error] at Sun, 08 Jan 2012 02:36:26 GMT [02:58:45] RECOVERY - Puppet freshness on ms1002 is OK: puppet ran at Sun Jan 8 02:58:38 UTC 2012 [03:13:13] PROBLEM - Puppet freshness on cp1043 is CRITICAL: Puppet has not run in the last 10 hours [03:21:23] PROBLEM - Puppet freshness on cp1044 is CRITICAL: Puppet has not run in the last 10 hours [03:34:23] RECOVERY - Puppet freshness on db1003 is OK: puppet ran at Sun Jan 8 03:34:11 UTC 2012 [03:35:23] PROBLEM - Puppet freshness on db22 is CRITICAL: Puppet has not run in the last 10 hours [03:49:13] PROBLEM - MySQL replication status on db1025 is CRITICAL: CHECK MySQL REPLICATION - lag - CRITICAL - Seconds_Behind_Master : 1401s [03:57:34] RECOVERY - MySQL replication status on db1025 is OK: CHECK MySQL REPLICATION - lag - OK - Seconds_Behind_Master : 0s [04:21:24] RECOVERY - Disk space on es1004 is OK: DISK OK [04:22:14] RECOVERY - MySQL disk space on es1004 is OK: DISK OK [04:37:04] PROBLEM - MySQL slave status on es1004 is CRITICAL: CRITICAL: Slave running: expected Yes, got No [06:57:08] PROBLEM - Squid on brewster is CRITICAL: Connection refused [10:06:07] PROBLEM - Disk space on es1004 is CRITICAL: DISK CRITICAL - free space: /a 442897 MB (3% inode=99%): [10:09:57] PROBLEM - MySQL disk space on es1004 is CRITICAL: DISK CRITICAL - free space: /a 414797 MB (3% inode=99%): [10:24:07] RECOVERY - MySQL slave status on es1004 is OK: OK: [11:34:56] zzz [13:23:06] PROBLEM - Puppet freshness on cp1043 is CRITICAL: Puppet has not run in the last 10 hours [13:31:07] PROBLEM - Puppet freshness on cp1044 is CRITICAL: Puppet has not run in the last 10 hours [13:44:37] PROBLEM - Puppet freshness on db22 is CRITICAL: Puppet has not run in the last 10 hours [14:08:37] PROBLEM - Puppet freshness on ms1002 is CRITICAL: Puppet has not run in the last 10 hours [14:17:25] someone around? [14:17:27] from ops [14:52:34] RECOVERY - Squid on brewster is OK: TCP OK - 0.002 second response time on port 8080 [14:54:47] * ToAruShiroiNeko wonders if a squid servers squirt ink. [16:43:49] RECOVERY - Puppet freshness on ms1002 is OK: puppet ran at Sun Jan 8 16:43:48 UTC 2012 [20:26:15] page deletions and blocks aren't showing up on watchlists over at enwiktionary. [21:42:08] zzz =_= [21:46:35] !log someone started incremental updating on searchidx1 ??!! [21:46:38] Logged the message, Master [21:50:54] !log killed broken search indexer thread on searchidx1 (please note searchidx1 is no longer in use!), and restarted incremental indexing on searchidx2 which was somehow broken [21:50:55] Logged the message, Master [22:06:40] ntpd still needs kicking @ wikitech. compare the 2 timestamps in this log msg. also compare vs. time of the revision and time it occurred in the channel http://wikitech.wikimedia.org/index.php?title=Server_admin_log&diff=41817&oldid=41816 [22:06:56] 08 02:05:12 <+logmsgbot> !log LocalisationUpdate completed (1.18) at Sun Jan 8 02:05:11 UTC 2012 [22:07:06] * 02:01 logmsgbot: LocalisationUpdate completed (1.18) at Sun Jan 8 02:05:11 UTC 2012 [22:08:10] wiki says that edit was at 2012-01-08T02:01:59 [22:58:32] gn8 folks [23:13:56] !log reedy synchronized wmf-config/CommonSettings.php 'Add cp1042 to XFF' [23:13:58] Logged the message, Master [23:20:48] jeremyb: interesting, a wikibot traveling back in time [23:21:02] haven't seen that since.. well, never actually. [23:21:09] Krinkle: just mediawiki with a broken clock [23:21:15] :) [23:21:33] !log reedy synchronized wmf-config/CommonSettings.php 'Add cp1001-cp1041' [23:21:35] Logged the message, Master [23:21:46] o.0 [23:22:00] Krinkle: i mentioned it like a week ago and i don't remember what the verdict was but it obviously wasn't fixed [23:22:57] Krinkle: maybe someone that has RT creation access (i.e. not me) can make one. or just `sudo service ntp restart` [23:24:08] !log For some reason cp1001-10042 weren't listed in CommonSettings.php XFF, but (at least) 1042 was in service, meaning edits were attributed to it [23:24:09] Logged the message, Master [23:26:14] Reedy: s/10042/1042/ ? [23:26:32] lol [23:26:35] close enough [23:32:18] PROBLEM - Puppet freshness on cp1043 is CRITICAL: Puppet has not run in the last 10 hours [23:40:18] PROBLEM - Puppet freshness on cp1044 is CRITICAL: Puppet has not run in the last 10 hours [23:54:18] PROBLEM - Puppet freshness on db22 is CRITICAL: Puppet has not run in the last 10 hours