GeekLabs.dk

Esbjergs teknologiværksted

Brugerværktøjer

Webstedsværktøjer


driftstatus:2020-04-01

Beskrivelse af nedbrud

Start: 2020-04-01 00:40:01
Slut : 2020-04-01 18:17:18
Varighed: 17h 37m 17s
Årsag: Ukendt (måske uannoncerede Stofa-problemer som Tobakken-personale har forsøgt at afhjælpe ved at tage strømmen)
Observationer:

  • Ingen internet fra 00:45 (DNS-opdateringslog)
  • Strøm til geekzerv og formentlig fiber-gateway afbrudt ca. 11:02 (syslog)
  • geekzerv kommer op ca. 11:05 med ny offentlig IP=95.154.36.25 (DNS-opdateringslog)
  • Derefter stor latency på netværk indtil ca. 16:00 (manuel observation, bekræftes af Munin der havde umanerligt svært ved at køre i hele perioden)
  • Efterfølgende stadig nogen latency, men stadig nogenlunde brugbart system (munin)

Detaljer

Geekzerv

Sidste logning: 2020-04-01T11:02:05+02:00
Første logning: 2020-04-01T11:05:07+02:00
Nedetid : PT03M02S

Tilgængelighed fra internettet

Sidste tilgængelighed : 2020-04-01T00:40:01+02:00 (sidste DNS-opdatering går igennem)
Første tilgængelighed : 2020-04-01T11:10:02+02:00 (første DNS-opdatering går igennem)
Delvis tilgængelighed : 2020-04-01T16:03:52+02:00 (manuel observation på kommandolinje og Munin)
Endelig tilgængelighed: 2020-04-01T18:17:18+02:00 (afbryder boinc/eistein@home workunits)
Utilgængelighedstid : PT10H30M01S
Delvis utilgængelighedstid : PT15H23M51S
Endelig utilgængelighedstid : PT17H37M17S

Bootet på ny kerne: Linux version 4.4.0-171.200-generic

Interventioner

  • Munin sender fejlmails om „Work timed out before all workers finished“ og „Lock already exists“ så forsøgt genstartet et par gange.
  • 12 oom-kills i syslog af boinc/einstain@home workunits siden genstart, CPU-belastning ser ikke alvorlig ud i top, men stopper alligevel 2020-04-01T18:17:18+02:00 to kørende boinc/einstain@home workunits, dette øger mærkbart responsen fra hele systemet (uforklarligt hvorfor, selv det CPU-data munin har samlet op viser megen idle-tid i perioden)

syslog

Apr  1 11:02:05 geekzerv named[2210]: network unreachable resolving 'ns4.gratisdns.dk/AAAA/IN': 2001:7fe::53#53
Apr  1 11:02:05 geekzerv named[2210]: network unreachable re

.. <power cut> ..

Apr  1 11:05:07 geekzerv rsyslogd: [origin software="rsyslogd" swVersion="8.16.0" x-pid="1611" x-info="http://www.rsyslog.com"] start
Apr  1 11:05:07 geekzerv systemd-modules-load[888]: Inserted module 'iscsi_tcp'
Apr  1 11:05:07 geekzerv systemd-modules-load[888]: Inserted module 'ib_iser'
Apr  1 11:05:07 geekzerv loadkeys[890]: Loading /etc/console-setup/cached.kmap.gz
Apr  1 11:05:07 geekzerv systemd[1]: Started Load Kernel Modules.
Apr  1 11:05:07 geekzerv systemd[1]: Started Set console keymap.
Apr  1 11:05:07 geekzerv systemd[1]: Started LVM2 metadata daemon.
Apr  1 11:05:07 geekzerv systemd[1]: Mounting FUSE Control File System...
Apr  1 11:05:07 geekzerv systemd[1]: Starting Apply Kernel Variables...
Apr  1 11:05:07 geekzerv systemd[1]: Starting Flush Journal to Persistent Storage...
Apr  1 11:05:07 geekzerv systemd[1]: Mounted FUSE Control File System.
Apr  1 11:05:07 geekzerv systemd[1]: Started Load/Save Random Seed.
Apr  1 11:05:07 geekzerv systemd[1]: Started udev Coldplug all Devices.
Apr  1 11:05:07 geekzerv systemd[1]: Started Flush Journal to Persistent Storage.
Apr  1 11:05:07 geekzerv lvm[904]:   2 logical volume(s) in volume group "geekzerv-vg" monitored
Apr  1 11:05:07 geekzerv systemd[1]: Started Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling.
Apr  1 11:05:07 geekzerv systemd[1]: Started Apply Kernel Variables.
Apr  1 11:05:07 geekzerv systemd[1]: Started Create Static Device Nodes in /dev.
Apr  1 11:05:07 geekzerv systemd[1]: Starting udev Kernel Device Manager...
Apr  1 11:05:07 geekzerv systemd[1]: Reached target Local File Systems (Pre).
Apr  1 11:05:07 geekzerv kernel: [    0.000000] microcode: CPU0 microcode updated early to revision 0xbc, date = 2010-10-03
Apr  1 11:05:07 geekzerv kernel: [    0.000000] Initializing cgroup subsys cpuset
Apr  1 11:05:07 geekzerv kernel: [    0.000000] Initializing cgroup subsys cpu
Apr  1 11:05:07 geekzerv kernel: [    0.000000] Initializing cgroup subsys cpuacct
Apr  1 11:05:07 geekzerv kernel: [    0.000000] Linux version 4.4.0-171-generic (buildd@lcy01-amd64-018) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.12) ) #200-Ubuntu SMP Tue Dec 3 11:04:55 UTC 2019 (Ubuntu 4.4.0-171.200-generic 4.4.203)
Apr  1 11:05:07 geekzerv kernel: [    0.000000] Command line: BOOT_IMAGE=/vmlinuz-4.4.0-171-generic root=/dev/mapper/geekzerv--vg-root ro
..
..
..
Apr  1 11:05:22 geekzerv systemd[1]: Starting LSB: Apache2 web server...
..
Apr  1 11:05:22 geekzerv apache2[2396]:  * Starting Apache httpd web server apache2

syslog (oom-kill):

miki@geekzerv:~$ grep oom-killer /var/log/syslog|grep einstein|wc -l
12
miki@geekzerv:~$ grep oom-killer /var/log/syslog|grep einstein|head -1
Apr  1 11:21:29 geekzerv kernel: [  998.944940] einstein_O2MD1_ invoked oom-killer: gfp_mask=0x24280ca, order=0, oom_score_adj=0
miki@geekzerv:~$ grep oom-killer /var/log/syslog|grep einstein|tail -1
Apr  1 17:59:21 geekzerv kernel: [24873.622764] einstein_O2MD1_ invoked oom-killer: gfp_mask=0x24280ca, order=0, oom_score_adj=0
miki@geekzerv:~$ zgrep oom-killer /var/log/syslog*|wc -l
29
miki@geekzerv:~$ zgrep oom-killer /var/log/syslog|wc -l
29

duckdns.log

2020-04-01T00:40:01+02:00
OK 89.184.153.87  NOCHANGE
OK A record on mikini.dk: 89.184.153.87 (A on mikini.dk) == 89.184.153.87 (A on geeklabs.duckdns.org)
OK A record on geeklabs.dk: 89.184.153.87 (A on geeklabs.dk) == 89.184.153.87 (A on geeklabs.duckdns.org)
OK A record on kirkgaard.biz: 89.184.153.87 (A on kirkgaard.biz) == 89.184.153.87 (A on geeklabs.duckdns.org)

2020-04-01T00:45:01+02:00


2020-04-01T00:50:02+02:00


2020-04-01T00:55:01+02:00


2020-04-01T01:00:01+02:00

...
...

2020-04-01T10:55:01+02:00

Error digging A record of geeklabs.duckdns.org: status 9
Output: ;; connection timed out; no servers could be reached

2020-04-01T11:00:01+02:00

^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@
2020-04-01T11:10:02+02:00
OK 95.154.36.25  UPDATED
ERROR A record on mikini.dk: 89.184.153.87 (A on mikini.dk) != 95.154.36.25 (A on geeklabs.duckdns.org)
ERROR A record on geeklabs.dk: 89.184.153.87 (A on geeklabs.dk) != 95.154.36.25 (A on geeklabs.duckdns.org)
ERROR A record on kirkgaard.biz: 89.184.153.87 (A on kirkgaard.biz) != 95.154.36.25 (A on geeklabs.duckdns.org)

2020-04-01T11:15:14+02:00
OK 95.154.36.25  NOCHANGE
ERROR A record on mikini.dk: 89.184.153.87 (A on mikini.dk) != 95.154.36.25 (A on geeklabs.duckdns.org)
ERROR A record on geeklabs.dk: 89.184.153.87 (A on geeklabs.dk) != 95.154.36.25 (A on geeklabs.duckdns.org)
ERROR A record on kirkgaard.biz: 89.184.153.87 (A on kirkgaard.biz) != 95.154.36.25 (A on geeklabs.duckdns.org)
driftstatus/2020-04-01.txt · Sidst ændret: 2020/04/01 18:43 af miki