Bug #659

OML server segfault

Added by Christoph Dwertmann 4 months ago. Updated 3 months ago.

Status:New Start date:24/01/2012
Priority:Normal Due date:
Assignee:- % Done:

0%

Category:server
Target version:-

Description

Here's an email from Pieter Becue:

I am trying to run all the basic OMF functionality tests, to ensure we have a correct OMF/OML setup.

However, when I try to run test03.rb (from repository/test/exp) it gives me this output : 
I am using oml2-apps-2.5.0 and oml server 2.6.1

pbecue@ExpController:/usr/share/omf-expctl-5.3/repository/test/exp$ omf exec test03.rb 

 INFO NodeHandler: OMF Experiment Controller 5.3 (git 4c44bbe)
 INFO NodeHandler: Slice ID: default_slice (default)
 INFO NodeHandler: Experiment ID: default_slice-2012-01-23t15.00.09+01.00
 INFO NodeHandler: Message authentication is disabled
 INFO NodeHandler: Web interface available at: http://10.11.31.22:4000
 INFO Experiment: load system:exp:stdlib
 INFO property.resetDelay: value = 210 (Fixnum)
 INFO property.resetTries: value = 1 (Fixnum)
 INFO Experiment: load system:exp:eventlib
 INFO Experiment: load test03.rb
 INFO property.res1: value = "omf.ibbt.open.node34" (String)
 INFO property.res2: value = "omf.ibbt.open.node35" (String)
 INFO property.mode1: value = "adhoc" (String)
 INFO property.mode2: value = "adhoc" (String)
 INFO property.wifi: value = "g" (String)
 INFO property.channel: value = "6" (String)
 INFO Topology: Loading topology 'omf.ibbt.open.node34'.
 INFO Topology: Loading topology 'omf.ibbt.open.node35'.
 INFO Experiment: Switching ON resources which are OFF
 INFO ALL_UP_AND_INSTALLED: Event triggered. Starting the associated tasks.
 INFO exp: Request from Experiment Script: Wait for 10s....
 INFO omf.ibbt.open.node34: Device 'net/w0' reported Not-Associated
 INFO omf.ibbt.open.node35: Device 'net/w0' reported Not-Associated
 INFO exp: Request from Experiment Script: Wait for 10s....
ERROR NodeHandler: 
ERROR NodeHandler:   The resource 'omf.ibbt.open.node35' reports that an error occured 
ERROR NodeHandler:   while running the application 'test:app:otr2#1'
ERROR NodeHandler:   The error message is '# ERROR Socket(sock): the remote peer closed the connection'
ERROR NodeHandler: 
ERROR NodeHandler:   The resource 'omf.ibbt.open.node35' reports that an error occured 
ERROR NodeHandler:   while running the application 'test:app:otr2#1'
ERROR NodeHandler:   The error message is '    32: Broken pipe'
ERROR NodeHandler: 
ERROR NodeHandler:   The resource 'omf.ibbt.open.node34' reports that an error occured 
ERROR NodeHandler:   while running the application 'test:app:otg2#1'
ERROR NodeHandler:   The error message is '# ERROR Socket(sock): the remote peer closed the connection'
ERROR NodeHandler: 
ERROR NodeHandler:   The resource 'omf.ibbt.open.node34' reports that an error occured 
ERROR NodeHandler:   while running the application 'test:app:otg2#1'
ERROR NodeHandler:   The error message is '    32: Broken pipe'
 INFO exp: TEST - Running ifconfig on Sender
 INFO exp: Request from Experiment Script: Wait for 15s....
 INFO exp: TEST - Running ifconfig on Receiver
 INFO exp: Request from Experiment Script: Wait for 15s....
 INFO EXPERIMENT_DONE: Event triggered. Starting the associated tasks.
 INFO NodeHandler: 
 INFO NodeHandler: Shutting down experiment, please wait...
 INFO NodeHandler: 
 INFO run: Experiment default_slice-2012-01-23t15.00.09+01.00 finished after 1:0

And this on the side of the OML server :  

INFO   OML2 Server V2.6.1
INFO   OML Protocol V3
INFO   Copyright 2007-2011 NICTA
DEBUG  Socket(server): Socket 3 successfully created
DEBUG  Socket(server): Socket bound to port: 3003
INFO   Database backend: 'sqlite'
INFO   Creating SQLite3 databases in /home/pbecue/usr/var/oml2
INFO   'server-client:4': new client connected
DEBUG3  Eventloop:  received 12 octets
DEBUG  'server-client:4': C_HEADER: received data, 12
DEBUG  'server-client:4': Meta <protocol>:<3>
DEBUG  'server-client:4': buffer repacked to 12 bytes (end of this pass)
DEBUG3  Eventloop:  received 440 octets
DEBUG  'server-client:4': C_HEADER: received data, 440
DEBUG  'server-client:4': Meta <experiment-id>:<default_slice-2012-01-23t15.00.09+01.00>
INFO   New experiment 'default_slice-2012-01-23t15.00.09+01.00'
INFO   Opening SQLite database for experiment 'default_slice-2012-01-23t15.00.09+01.00', path is '/home/pbecue/usr/var/oml2/default_slice-2012-01-23t15.00.09+01.00.sq3'
DEBUG  prepare to exec BEGIN TRANSACTION; 
DEBUG  Database table list seems empty; need to create tables.
DEBUG  Got table list with 0 tables in it
DEBUG  prepare to exec CREATE TABLE _senders (name TEXT PRIMARY KEY, id INTEGER UNIQUE); 
DEBUG  prepare to exec CREATE TABLE _experiment_metadata (key TEXT PRIMARY KEY, value TEXT); 
DEBUG  prepare to exec INSERT INTO "_experiment_metadata" ("key", "value") VALUES ('start_time', '1327334534'); 
INFO   Experiment 'default_slice-2012-01-23t15.00.09+01.00': Set DB start-time = 1327334534
DEBUG  'server-client:4': Meta <start_time>:<1327360150>
DEBUG  'server-client:4': Meta <sender-id>:<Receiver>
DEBUG  prepare to exec INSERT INTO "_senders" ("name", "id") VALUES ('Receiver', '1'); 
DEBUG  'server-client:4': Meta <app-name>:<>
DEBUG  'server-client:4': Meta <>:<1 _udp_in ts_avg:double ts_min:double ts_max:double flow_id_avg:double flow_id_min:double flow_id_max:double seq_no_avg:double seq_no_min:double seq_no_max:double pkt_length_avg:double pkt_length_min:double pkt_length_max:double src_host:string src_port_avg:double src_port_min:double src_port_max:double>
WARN   'server-client:4': Unknown meta info '' (1 _udp_in ts_avg:double ts_min:double ts_max:double flow_id_avg:double flow_id_min:double flow_id_max:double seq_no_avg:double seq_no_min:double seq_no_max:double pkt_length_avg:double pkt_length_min:double pkt_length_max:double src_host:string src_port_avg:double src_port_min:double src_port_max:double) ignored
DEBUG  'server-client:4': Meta <content>:<binary>
DEBUG  'server-client:4': remaining bytes not enough to process a measurement
DEBUG  'server-client:4': buffer repacked to 0 bytes (end of this pass)
INFO   'server-client:8': new client connected
DEBUG3  Eventloop:  received 12 octets
DEBUG  'server-client:8': C_HEADER: received data, 12
DEBUG  'server-client:8': Meta <protocol>:<3>
DEBUG  'server-client:8': buffer repacked to 12 bytes (end of this pass)
DEBUG3  Eventloop:  received 439 octets
DEBUG  'server-client:8': C_HEADER: received data, 439
DEBUG  'server-client:8': Meta <experiment-id>:<default_slice-2012-01-23t15.00.09+01.00>
INFO   Experiment 'default_slice-2012-01-23t15.00.09+01.00': Database 'default_slice-2012-01-23t15.00.09+01.00' already open (1 clients)
DEBUG  'server-client:8': Meta <start_time>:<1322871785>
DEBUG  'server-client:8': Meta <sender-id>:<Sender>
DEBUG  prepare to exec INSERT INTO "_senders" ("name", "id") VALUES ('Sender', '2'); 
DEBUG  'server-client:8': Meta <app-name>:<>
DEBUG  'server-client:8': Meta <>:<1 _udp_out ts_avg:double ts_min:double ts_max:double flow_id_avg:double flow_id_min:double flow_id_max:double seq_no_avg:double seq_no_min:double seq_no_max:double pkt_length_avg:double pkt_length_min:double pkt_length_max:double dst_host:string dst_port_avg:double dst_port_min:double dst_port_max:double>
WARN   'server-client:8': Unknown meta info '' (1 _udp_out ts_avg:double ts_min:double ts_max:double flow_id_avg:double flow_id_min:double flow_id_max:double seq_no_avg:double seq_no_min:double seq_no_max:double pkt_length_avg:double pkt_length_min:double pkt_length_max:double dst_host:string dst_port_avg:double dst_port_min:double dst_port_max:double) ignored
DEBUG  'server-client:8': Meta <content>:<binary>
DEBUG  'server-client:8': remaining bytes not enough to process a measurement
DEBUG  'server-client:8': buffer repacked to 0 bytes (end of this pass)
DEBUG3  Eventloop:  received 121 octets
DEBUG  'server-client:8': C_BINARY_DATA: received data, 121
Segmentation fault

The EC log files are attached.

Any idea what might be causing the segmentation fault ? I saw some bug reports in the repository about segmentation faults with OML, but I think those were all fixed in the versions I am using ? 

Thanks in advance !

Best regards,

Pieter

default_slice-2012-01-23t14.16.31_01.00.log (47.7 kB) Christoph Dwertmann, 24/01/2012 03:54 pm

default_slice-2012-01-23t14.16.31_01.00-state.xml (9.5 kB) Christoph Dwertmann, 24/01/2012 03:54 pm

History

Updated by Olivier Mehani 4 months ago

Maybe upgrading oml-apps could help?

Updated by Olivier Mehani 3 months ago

On Fri, Jan 27, 2012 at 09:05:19PM +1100, Pieter Becue wrote:

I re-ran the experiment with liboml-2.6.1 and unfortunately got the
same results (when running test03.rb) :

Updated by Olivier Mehani 3 months ago

Running the experiment on norbit gives the following, but seems to work.

$ omf-5.3  exec /usr/share/omf-expctl-5.3/repository/test/exp/test03.rb  -- --res1 omf.nicta.node1 --res2 omf.nicta.node2

 INFO NodeHandler: OMF Experiment Controller 5.3 (git 552e614)
 INFO NodeHandler: Slice ID: default_slice (default)
 INFO NodeHandler: Experiment ID: default_slice-2012-02-10t11.39.42+11.00
 INFO NodeHandler: Message authentication is disabled
 WARN server: TCPServer Error: Address already in use - bind(2)
 WARN server: TCPServer Error: Address already in use - bind(2)
 INFO NodeHandler: Web interface available at: http://10.0.0.200:4000
 INFO Experiment: load system:exp:stdlib
 INFO property.resetDelay: value = 210 (Fixnum)
 INFO property.resetTries: value = 1 (Fixnum)
 INFO Experiment: load system:exp:eventlib
 INFO Experiment: load /usr/share/omf-expctl-5.3/repository/test/exp/test03.rb
 INFO property.res1: value = "omf.nicta.node1" (String)
 INFO property.res2: value = "omf.nicta.node2" (String)
 INFO property.mode1: value = "adhoc" (String)
 INFO property.mode2: value = "adhoc" (String)
 INFO property.wifi: value = "g" (String)
 INFO property.channel: value = "6" (String)
 INFO Topology: Loading topology 'omf.nicta.node1'.
 INFO Topology: Loading topology 'omf.nicta.node2'.
 INFO Experiment: Switching ON resources which are OFF
 INFO stdlib: Waiting for nodes (Up/Down/Total): 0/2/2 - (still down: omf.nicta.node2,omf.nicta.node1)
 INFO stdlib: Waiting for nodes (Up/Down/Total): 0/2/2 - (still down: omf.nicta.node2,omf.nicta.node1)
 INFO stdlib: Waiting for nodes (Up/Down/Total): 0/2/2 - (still down: omf.nicta.node2,omf.nicta.node1)
 INFO stdlib: Waiting for nodes (Up/Down/Total): 0/2/2 - (still down: omf.nicta.node2,omf.nicta.node1)
 INFO stdlib: Waiting for nodes (Up/Down/Total): 0/2/2 - (still down: omf.nicta.node2,omf.nicta.node1)
 INFO stdlib: Waiting for nodes (Up/Down/Total): 0/2/2 - (still down: omf.nicta.node2,omf.nicta.node1)
 INFO ALL_UP_AND_INSTALLED: Event triggered. Starting the associated tasks.
 INFO exp: Request from Experiment Script: Wait for 10s....
 INFO omf.nicta.node2: Device 'net/w0' reported Not-Associated
 INFO omf.nicta.node1: Device 'net/w0' reported Not-Associated
 INFO exp: Request from Experiment Script: Wait for 10s....
ERROR NodeHandler: 
ERROR NodeHandler:   The resource 'omf.nicta.node1' reports that an error occured 
ERROR NodeHandler:   while running the application 'test:app:otg2#1'
ERROR NodeHandler:   The error message is 'WARNING:  using custom log functions is now deprecated;'
ERROR NodeHandler: 
ERROR NodeHandler:   The resource 'omf.nicta.node2' reports that an error occured 
ERROR NodeHandler:   while running the application 'test:app:otr2#1'
ERROR NodeHandler:   The error message is 'WARNING:  using custom log functions is now deprecated;'
ERROR NodeHandler: 
ERROR NodeHandler:   The resource 'omf.nicta.node2' reports that an error occured 
ERROR NodeHandler:   while running the application 'test:app:otr2#1'
ERROR NodeHandler:   The error message is 'WARNING:  this feature will be removed in OML v2.7.0'
ERROR NodeHandler: 
ERROR NodeHandler:   The resource 'omf.nicta.node1' reports that an error occured 
ERROR NodeHandler:   while running the application 'test:app:otg2#1'
ERROR NodeHandler:   The error message is 'WARNING:  this feature will be removed in OML v2.7.0'
 INFO omf.nicta.node2: Device 'net/w0' reported 36:8D:7E:1E:A1:1D
 INFO omf.nicta.node1: Device 'net/w0' reported 36:8D:7E:1E:A1:1D
 INFO exp: TEST - Running ifconfig on Sender
 INFO exp: Request from Experiment Script: Wait for 15s....
 INFO exp: TEST - Running ifconfig on Receiver
 INFO exp: Request from Experiment Script: Wait for 15s....
 INFO EXPERIMENT_DONE: Event triggered. Starting the associated tasks.
 INFO NodeHandler: 
 INFO NodeHandler: Shutting down experiment, please wait...
 INFO NodeHandler: 
 INFO run: Experiment default_slice-2012-02-10t11.39.42+11.00 finished after 2:15

I reckon the TCPServer Error should be ignored. The important bit seems to be the messages saying

ERROR NodeHandler: 
ERROR NodeHandler:   The resource 'omf.nicta.node1' reports that an error occured 
ERROR NodeHandler:   while running the application 'test:app:otg2#1'
ERROR NodeHandler:   The error message is 'WARNING:  using custom log functions is now deprecated;'
ERROR NodeHandler: 
ERROR NodeHandler:   The resource 'omf.nicta.node2' reports that an error occured 
ERROR NodeHandler:   while running the application 'test:app:otr2#1'
ERROR NodeHandler:   The error message is 'WARNING:  using custom log functions is now deprecated;'
ERROR NodeHandler: 
ERROR NodeHandler:   The resource 'omf.nicta.node2' reports that an error occured 
ERROR NodeHandler:   while running the application 'test:app:otr2#1'
ERROR NodeHandler:   The error message is 'WARNING:  this feature will be removed in OML v2.7.0'
ERROR NodeHandler: 
ERROR NodeHandler:   The resource 'omf.nicta.node1' reports that an error occured 
ERROR NodeHandler:   while running the application 'test:app:otg2#1'
ERROR NodeHandler:   The error message is 'WARNING:  this feature will be removed in OML v2.7.0'

where Pieter's instance segfaults.

Pieter: would you mind giving us the output of dpkg -l \*oml\* on your experiment controller and your nodes? What image are you using on the nodes?

Updated by Pieter Becue 3 months ago

The output of dpkg -l \*oml\* :

root@*ExpController*:~# dpkg -l \*oml\*
Desired=Unknown/Install/Remove/Purge/Hold | Status=Not/Inst/Cfg-files/Unpacked/Failed-cfg/Half-inst/trig-aWait/Trig-pend |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad) ||/ Name Version Description
+++-==============-==============-============================================
un liboml <none> (no description available)
un liboml-dev <none> (no description available)
ii liboml2-0 2.6.1-ubuntu3 OML: The Orbit Measurement Library
ii liboml2-dev 2.6.1-ubuntu3 OML measurement library headers
un oml-server <none> (no description available)

On resource:
root@*node34*:~# dpkg -l \*oml\*
Desired=Unknown/Install/Remove/Purge/Hold | Status=Not/Inst/Cfg-files/Unpacked/Failed-cfg/Half-inst/trig-aWait/Trig-pend |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad) ||/ Name Version Description
+++-==============-==============-============================================
un liboml <none> (no description available)
un liboml-dev <none> (no description available)
ii liboml2-0 2.6.1-ubuntu3 OML: The Orbit Measurement Library
ii liboml2-dev 2.6.1-ubuntu3 OML measurement library headers
un oml-server <none> (no description available)
ii oml2-otg2 2.4.0-ubuntu1 Orbit Traffic Generator

/usr/bin/otg2 on the resource refers to oml2-apps-2.5.0/otg/oml2-otg

Updated by Olivier Mehani 3 months ago

Hi Pieter,

Where is your OML server (the one that segfaults) located?

On our EC, I get the following output (truncated):

$ dpkg -l \*oml\* 
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                        Version                     Description
+++-===========================-===========================-======================================================================
rc  liboml                      2.3.9-ubuntu2               OML: The Orbit Measurement Library
un  liboml-dev                  <none>                      (no description available)
ii  liboml2-0                   2.6.1-ubuntu3               OML: The Orbit Measurement Library
ii  liboml2-dev                 2.6.1-ubuntu3               OML measurement library headers
(...)
ii  oml2-server                 2.6.1-ubuntu3               OML measurement server

Could you give us the output of that same command on the machine running the OML server? It doesn't seem to be the EC, in your case.

Updated by Pieter Becue 3 months ago

Yes idd, our OML server runs on the AM (Our AM and EC run on different servers). Here's the output of dpkg -l \*oml\* :

||/ Name Version Description
+++-==============-==============-============================================
un liboml <none> (no description available)
un liboml-dev <none> (no description available)
ii liboml2-0 2.6.1-ubuntu1 OML: The Orbit Measurement Library
un liboml2-dev <none> (no description available)
un oml-server <none> (no description available)
ii oml2-server 2.6.1-ubuntu3 OML measurement server

Also available in: Atom PDF