[Debian-ha-maintainers] corosync-blackbox qb_rb_chunk_read failed
braun
braun at dc.uni-due.de
Thu Jul 23 11:20:14 BST 2015
Hello,
Thank You for the helpful infos, that saves time and nerves for me and
cleared my problem. I then avoid to use ppa.mmogp.com and uninstall
the packages and sources. And I don't spend time to resolve this
blackbox-error. The most critical part of my servers are drbd and that
is HA by nature. I now will use my own scripts for failover which are
simple and worked for years on other squeeze-servers before I turned to
pacemaker. It is a regression for me, but I will track the development
of corosync/pacemaker in debian and hope it will be in good condition
available in a while.
Best Regards,
Christina
On 23.07.2015 07:37, Richard B Winters wrote:
> On 07/22/2015 08:47 AM, braun wrote:
>> I want to use corosync/pacemaker productive. Is this a serious error ?
>>
>>
>> corosync-blackbox ends with error:
>>
>> [debug] shm size:8392704; real_size:8392704; rb->word_size:2098176
>> [debug] read total of: 8392724
>> ERROR: qb_rb_chunk_read failed: Connection timed out
>> [trace] ENTERING qb_rb_close()
>> [debug] Free'ing ringbuffer: /dev/shm/qb-create_from_file-header
>>
>> root at willi:~/errorblackbox# df -kl /dev/shm
>> Dateisystem 1K-Blöcke Benutzt Verfügbar Verw% Eingehängt auf
>> tmpfs 16462300 14448 16447852 1% /dev/shm
>>
>>
>> I followed the wiki on
>> https://wiki.debian.org/Debian-HA/ClustersFromScratch and use the
>> folling packets:
>>
>> corosync:
>> Installiert: 2.3.4-1
>> Installationskandidat: 2.3.4-1
>> Versionstabelle:
>> *** 2.3.4-1 0
>> 500 http://ppa.mmogp.com/apt/debian/ jessie/main amd64 Packages
>> 100 /var/lib/dpkg/status
>> 1.4.6-1.1 0
>> 500 http://debian.uni-duisburg-essen.de/debian/ jessie/main
>> amd64 Packag
>> es
>> 500 http://ftp.de.debian.org/debian/ testing/main amd64 Packages
>> pacemaker:
>> Installiert: 1.1.12-1
>> Installationskandidat: 1.1.12-1
>> Versionstabelle:
>> *** 1.1.12-1 0
>> 500 http://ppa.mmogp.com/apt/debian/ jessie/main amd64 Packages
>> 100 /var/lib/dpkg/status
>> fence-agents:
>> Installiert: 4.0.18-1
>> Installationskandidat: 4.0.18-1
>> Versionstabelle:
>> *** 4.0.18-1 0
>> 500 http://ftp.de.debian.org/debian/ testing/main amd64 Packages
>> 100 /var/lib/dpkg/status
>> 4.0.17-2 0
>> 500 http://ppa.mmogp.com/apt/debian/ jessie/main amd64 Packages
>> 3.1.5-2 0
>> 500 http://debian.uni-duisburg-essen.de/debian/ jessie/main
>> amd64 Packag
>> es
>> libqb0:
>> Installiert: 0.17.1-4
>> Installationskandidat: 0.17.1-4
>> Versionstabelle:
>> *** 0.17.1-4 0
>> 500 http://ftp.de.debian.org/debian/ testing/main amd64 Packages
>> 100 /var/lib/dpkg/status
>> 0.17.1-1 0
>> 500 http://ppa.mmogp.com/apt/debian/ jessie/main amd64 Packages
>> 0.11.1-2 0
>> 500 http://debian.uni-duisburg-essen.de/debian/ jessie/main
>> amd64 Packag
>> es
>> libqb-dev:
>> Installiert: 0.17.1-4
>> Installationskandidat: 0.17.1-4
>> Versionstabelle:
>> *** 0.17.1-4 0
>> 500 http://ftp.de.debian.org/debian/ testing/main amd64 Packages
>> 100 /var/lib/dpkg/status
>> 0.17.1-1 0
>> 500 http://ppa.mmogp.com/apt/debian/ jessie/main amd64 Packages
>> 0.11.1-2 0
>> 500 http://debian.uni-duisburg-essen.de/debian/ jessie/main
>> amd64 Packag
>> es
>>
>> corosync.conf (modiefied from wheezy,where I don't try corosync_blackbox):
>>
>> totem {
>> version: 2
>>
>> ip_version: ipv4
>>
>> # How long before declaring a token lost (ms)
>> token: 4000
>>
>> # How many token retransmits before forming a new configuration
>> token_retransmits_before_loss_const: 10
>>
>> # How long to wait for join messages in the membership protocol (ms)
>> join: 1100
>>
>> # How long to wait for consensus to be achieved before starting a
>> new round of membership configuration (ms)
>> consensus: 3600
>>
>> # how long to wait before checking for a partition when no
>> multicast
>> merge: 1000
>>
>> # Turn off the virtual synchrony filter
>> vsftype: none
>>
>> # Number of messages that may be sent by one processor on receipt of
>> the token
>> max_messages: 100
>>
>> # Limit generated nodeids to 31-bits (positive signed integers)
>> clear_node_high_bit: yes
>>
>> crypto_cipher: aes256
>> crypto_hash: sha1
>>
>> ### # Disable encryption
>> ### secauth: off
>>
>> # How many threads to use for encryption/decryption
>> ### wir haben 4 cpu-cores C.B.
>> threads: 4
>>
>> # Optionally assign a fixed node id (integer)
>> ### habe ich von der ip genommen, brauchen wir eigentlich nicht in
>> ip4 C.B.
>> # nodeid: 22
>>
>> # This specifies the mode of redundant ring, which may be none,
>> active, or passive.
>> rrp_mode: none
>>
>> ### bei den interfaces ucast genommen, da nur 2 hosts und einfacher als
>> mcast
>> ### geht aber irgendwie nicht
>> interface {
>> # The following values need to be set based on your environment
>> # fuer diskmirror
>> ringnumber: 0
>> bindnetaddr: 192.168.63.0
>> broadcast: yes
>> ##mcastaddr: 230.168.63.1
>> #mcastport: 5405
>> mcastport: 5405
>> }
>> transport: udpu
>> }
>>
>> nodelist {
>> node {
>> ring0_addr: 192.168.63.22
>> nodeid: 22
>> }
>> node {
>> ring0_addr: 192.168.63.24
>> nodeid: 24
>> }
>> }
>>
>>
>> quorum {
>> provider: corosync_votequorum
>> two_node: 1
>> expected_votes: 2
>> }
>>
>>
>>
>> logging {
>> fileline: off
>> to_stderr: yes
>> to_logfile: yes
>> to_syslog: yes
>> logfile: /var/log/corosync.log
>> syslog_facility: daemon
>> debug: off
>> timestamp: on
>> logger_subsys {
>> subsys: QUORUM
>> debug: off
>> }
>> }
>>
>> qb {
>> ipc_type: shm
>> }
>>
>> corosync.log:
>>
>> Jul 22 13:59:32 [1431] willi corosync info [MAIN ] Corosync built-in
>> features: dbus testagents rdma watchdog aug
>> eas systemd upstart xmlconf qdevices snmp pie relro bindnow
>> Jul 22 13:59:32 [1431] willi corosync notice [TOTEM ] Initializing
>> transport (UDP/IP Unicast).
>> Jul 22 13:59:32 [1431] willi corosync notice [TOTEM ] Initializing
>> transmit/receive security (NSS) crypto: aes256 h
>> ash: sha1
>> Jul 22 13:59:32 [1431] willi corosync notice [TOTEM ] The network
>> interface [192.168.63.24] is now up.
>> Jul 22 13:59:32 [1431] willi corosync notice [SERV ] Service engine
>> loaded: corosync configuration map access [0]
>> Jul 22 13:59:32 [1431] willi corosync info [QB ] server name: cmap
>> Jul 22 13:59:32 [1431] willi corosync notice [SERV ] Service engine
>> loaded: corosync configuration service [1]
>> Jul 22 13:59:32 [1431] willi corosync info [QB ] server name: cfg
>> Jul 22 13:59:32 [1431] willi corosync notice [SERV ] Service engine
>> loaded: corosync cluster closed process group
>> service v1.01 [2]
>> Jul 22 13:59:32 [1431] willi corosync info [QB ] server name: cpg
>> Jul 22 13:59:32 [1431] willi corosync notice [SERV ] Service engine
>> loaded: corosync profile loading service [4]
>> Jul 22 13:59:32 [1431] willi corosync info [WD ] Watchdog is now
>> been tickled by corosync.
>> Jul 22 13:59:32 [1431] willi corosync info [WD ] no resources
>> configured.
>> Jul 22 13:59:32 [1431] willi corosync notice [SERV ] Service engine
>> loaded: corosync watchdog service [7]
>> Jul 22 13:59:32 [1431] willi corosync notice [QUORUM] Using quorum
>> provider corosync_votequorum
>> Jul 22 13:59:32 [1431] willi corosync notice [VOTEQ ] Waiting for all
>> cluster members. Current votes: 1 expected_vo
>> tes: 2
>> Jul 22 13:59:32 [1431] willi corosync notice [SERV ] Service engine
>> loaded: corosync vote quorum service v1.0 [5]
>> Jul 22 13:59:32 [1431] willi corosync info [QB ] server name:
>> votequorum
>> Jul 22 13:59:32 [1431] willi corosync notice [SERV ] Service engine
>> loaded: corosync cluster quorum service v0.1 [
>> 3]
>> Jul 22 13:59:32 [1431] willi corosync info [QB ] server name: quorum
>> Jul 22 13:59:32 [1431] willi corosync notice [TOTEM ] adding new UDPU
>> member {192.168.63.22}
>> Jul 22 13:59:32 [1431] willi corosync notice [TOTEM ] adding new UDPU
>> member {192.168.63.24}
>> Jul 22 13:59:32 [1431] willi corosync notice [TOTEM ] A new membership
>> (192.168.63.24:30668) was formed. Members jo
>> ined: 24
>> Jul 22 13:59:32 [1431] willi corosync notice [VOTEQ ] Waiting for all
>> cluster members. Current votes: 1 expected_vo
>> tes: 2
>> Jul 22 13:59:32 [1431] willi corosync notice [VOTEQ ] Waiting for all
>> cluster members. Current votes: 1 expected_vo
>> tes: 2
>> Jul 22 13:59:32 [1431] willi corosync notice [VOTEQ ] Waiting for all
>> cluster members. Current votes: 1 expected_vo
>> tes: 2
>> Jul 22 13:59:32 [1431] willi corosync notice [QUORUM] Members[1]: 24
>> Jul 22 13:59:32 [1431] willi corosync notice [MAIN ] Completed service
>> synchronization, ready to provide service.
>> Jul 22 13:59:40 [1431] willi corosync notice [TOTEM ] A new membership
>> (192.168.63.22:30672) was formed. Members jo
>> ined: 22
>> Jul 22 13:59:40 [1431] willi corosync notice [QUORUM] This node is
>> within the primary component and will provide se
>> rvice.
>> Jul 22 13:59:40 [1431] willi corosync notice [QUORUM] Members[2]: 22 24
>> Jul 22 13:59:40 [1431] willi corosync notice [MAIN ] Completed service
>> synchronization, ready to provide service.
>> root at willi:~/errorblackbox#
>>
> Hello,
>
> First please allow me to give you proper warning regarding
> ppa.mmogp.com: It is _not_ a good candidate for production use; it is a
> freely available ppa, and is _unofficial_. Please use this repository at
> your own risk.
>
> In Debian Jessie+, the Pacemaker/Corosync stack is not yet ready for
> production, and for that we apologize - in some time it will be
> officially available for Debian, but we are still testing and preparing
> all of the packages and documentation.
>
> I did some searching regarding your error, and I came across this Redhat
> issue that seems to be related, perhaps it can be of some help to you:
>
> https://bugzilla.redhat.com/show_bug.cgi?id=1114852
>
> The next time you get this error, if you could give us not only the full
> blackbox output, but also the corosync, pacemaker, libqb, and syslog
> logs - that would help us to better identify the issue, and perhaps help
> to offer a solution.
>
> If the error is related/similar to that mentioned in the link above - a
> good point to note is that it seems to be attempting to read
> uninitialized memory:
>
>
>> [trace] ENTERING qb_rb_close()
>> [debug] Free'ing ringbuffer: /dev/shm/qb-create_from_file-header
> It should be resolved, it is a serious error.
>
>
> You could also try using a configuration generated by PCS or CRMSH and
> go from that basic variant and built upon it. There are major
> differences in the APIs and facilities of the stack between Wheezy and
> Jessie.
>
> You can also seek help on OFTC in the #debian-ha channel, as well as
> freenode.net's #clusterlabs channel.
>
>
> I hope that helps to provide you with some direction.
>
>
> Best,
>
>
>
>
>
>
> _______________________________________________
> Debian-ha-maintainers mailing list
> Debian-ha-maintainers at lists.alioth.debian.org
> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/debian-ha-maintainers
--
**********************************************
* email: braun at dc.uni-due.de
* Christina Braun
* WiWI/ICB/Informatik
* University DUE
* Schuetzenbahn 70 tel.: + 49 201 183-3929
* D-45127 Essen fax.: + 49 201 183-2419
**********************************************
More information about the Debian-ha-maintainers
mailing list