Jump to content
jedimatt42

TIPI - TI-99/4A to Raspberry PI interface development

Recommended Posts

On 3/14/2020 at 11:01 AM, BeeryMiller said:

I did see this in the daemon.log file, but I have slept since, and I thought the connection issue hit after I left for work which would have been 6:30 EST, but it looks like the log file may be off an hour.

 

Mar 13 14:59:22 tipi3 systemd[1]: tipi.service: Main process exited, code=killed                          , status=15/TERM
Mar 13 14:59:22 tipi3 systemd[1]: tipi.service: Succeeded.
Mar 13 14:59:22 tipi3 systemd[1]: Stopped TI-99/4A DSR Service.
Mar 13 14:59:22 tipi3 systemd[1]: Started TI-99/4A DSR Service.
Mar 13 14:59:22 tipi3 systemd[1]: Stopping TI-99/4A DSR Service...
Mar 13 14:59:22 tipi3 systemd[1]: tipi.service: Main process exited, code=killed                          , status=15/TERM
Mar 13 14:59:22 tipi3 systemd[1]: tipi.service: Succeeded.
Mar 13 14:59:22 tipi3 systemd[1]: Stopped TI-99/4A DSR Service.
Mar 13 14:59:22 tipi3 systemd[1]: Started TI-99/4A DSR Service.
Mar 13 14:59:23 tipi3 systemd[1]: Stopping TI-99/4A DSR Service...
Mar 13 14:59:23 tipi3 systemd[1]: tipi.service: Main process exited, code=killed                          , status=15/TERM
Mar 13 14:59:23 tipi3 systemd[1]: tipi.service: Succeeded.
Mar 13 14:59:23 tipi3 systemd[1]: Stopped TI-99/4A DSR Service.
Mar 13 14:59:23 tipi3 systemd[1]: tipi.service: Start request repeated too quick                          ly.
Mar 13 14:59:23 tipi3 systemd[1]: tipi.service: Failed with result 'start-limit-                          hit'.
Mar 13 14:59:23 tipi3 systemd[1]: Failed to start TI-99/4A DSR Service.
Mar 13 14:59:23 tipi3 tipiwatchdog.sh[586]: Job for tipi.service failed.
Mar 13 14:59:23 tipi3 tipiwatchdog.sh[586]: See "systemctl status tipi.service"                           and "journalctl -xe" for details.
Mar 13 14:59:23 tipi3 systemd[1]: tipi.service: Start request repeated too quick                          ly.
Mar 13 14:59:23 tipi3 systemd[1]: tipi.service: Failed with result 'start-limit-                          hit'.
Mar 13 14:59:23 tipi3 systemd[1]: Failed to start TI-99/4A DSR Service.
Mar 13 14:59:23 tipi3 tipiwatchdog.sh[586]: Job for tipi.service failed.

 

daemon.log 418.78 kB · 1 download tipi.log 2.87 MB · 2 downloads tipi.log.1 4.88 MB · 1 download

 

So, at the same time as this, in the tipi.log, it looks like the service was being reset a bunch of times in a row... a little reset storm... there are no statements of error... just the watchdog service killing the tipi.service when the 4A sends the tipi-reset signal via cru. This usually only happens when a 4A resets to title screen. 

 

Do you @BeeryMiller have code that is automating this reset via cru that might have gone haywire? 

 

[email protected] 

 

Share this post


Link to post
Share on other sites
12 hours ago, jedimatt42 said:

 

So, at the same time as this, in the tipi.log, it looks like the service was being reset a bunch of times in a row... a little reset storm... there are no statements of error... just the watchdog service killing the tipi.service when the 4A sends the tipi-reset signal via cru. This usually only happens when a 4A resets to title screen. 

 

Do you @BeeryMiller have code that is automating this reset via cru that might have gone haywire? 

 

[email protected] 

 

 

I went through and reviewed my code.  When I first load and initialize the program, I do a reset of the TIPI.  Afterwards, the reset of the TIPI does not take place again and that initialization process is not run again.

 

When a user hangs up, or is disconnected for being idle (3 minutes as programmed), then the port is closed.  I then jump into a loop sending a message of >22,>00,>07 looking to accept a connection.  I copied your send/recv msg code from the DSR into my code as running the code from the DSR itself must have been a timing issue with the Geneve as it would hang up in MDOS mode.  I've been using that same code for the past year or so, so no reason to suspect the actual send/recv code.

 

Two thing come to my mind.  First, I can comment out the call that resets the TIPI.  I will do that in the next few minutes and see how things play out.  I think that code worked its way into the program when there was an issue back some time ago.  That may have been addressed with the 1.50 or 1.51 updates. 

 

The only other thing that comes to my mind, is if the TIPI Accept command is doing some kind of reset possibly somewhere of the TIPI DSR service????

 

Beery

 

Share this post


Link to post
Share on other sites
18 hours ago, jedimatt42 said:

 

So, at the same time as this, in the tipi.log, it looks like the service was being reset a bunch of times in a row... a little reset storm... there are no statements of error... just the watchdog service killing the tipi.service when the 4A sends the tipi-reset signal via cru. This usually only happens when a 4A resets to title screen. 

 

Do you @BeeryMiller have code that is automating this reset via cru that might have gone haywire? 

 

[email protected] 

 

I've got my code up and running.  So far, no issues after about 8 logins and timeouts.  Will know tomorrow to know how the app runs.


Beery

  • Like 3

Share this post


Link to post
Share on other sites

You'll find code in TIPI messaging called RESET, but that is handshaking code. Not signal level reset. The constant in messaging called RESET should be renamed HELLO. 

 

The *RESET signal that triggers what we see in the logs, of process restarts, is an actual wire from the CPLD to a PI GPIO input. I probably should have put an external pull-up resister on it, but there is an internal enabled at the receiving GPIO pin, inside the PI processor. The code that triggers this, in all the code I have written, is only in the 4A powerup routine of my DSR.

 

Maybe there was a compound issue? Do the logs show the same reset storm at the time of failure? 

 

[email protected]

Share this post


Link to post
Share on other sites
2 hours ago, jedimatt42 said:

You'll find code in TIPI messaging called RESET, but that is handshaking code. Not signal level reset. The constant in messaging called RESET should be renamed HELLO. 

 

The *RESET signal that triggers what we see in the logs, of process restarts, is an actual wire from the CPLD to a PI GPIO input. I probably should have put an external pull-up resister on it, but there is an internal enabled at the receiving GPIO pin, inside the PI processor. The code that triggers this, in all the code I have written, is only in the 4A powerup routine of my DSR.

 

Maybe there was a compound issue? Do the logs show the same reset storm at the time of failure? 

 

[email protected]

 

The *RESET signal you are referencing in the 4A Powerup code is what I borrowed from your DSR code back quite some time ago.  I do not recall the circumstances, but there was some reason I needed it, or thought I needed it, from some behavior I was experiencing.  If I recall correctly, I think I was using it to force a socket disconnect from another system.

 

I will need to do a reboot of the PI and repeat the run to see when I get the issue and can review the logs.  When I get more details, I will follow-up.

 

Beery

 

  • Like 3

Share this post


Link to post
Share on other sites

Matt,

 

I have attached two files.  After about 9 hours, is when access was lost.  In the tipi.log file which is at the very end, there is a break in time in the log file.  Around 3/24 at 06:18, I logged into the BBS to check it's status.  Later, from my workplace, I tried to connect and there was no connection.  In the daemon.log file, not sure what all that means after the 06:18 when I logged in and then out.

 

From tipi.log....

 

2020-03-24 06:18:59,737 TiSocket    : INFO     wrote 1 bytes to socket: 1
2020-03-24 06:18:59,741 TiSocket    : INFO     wrote 1 bytes to socket: 1
2020-03-24 06:18:59,744 TiSocket    : INFO     wrote 1 bytes to socket: 1
2020-03-24 06:18:59,748 TiSocket    : INFO     wrote 1 bytes to socket: 1
2020-03-24 06:18:59,751 TiSocket    : INFO     wrote 1 bytes to socket: 1
2020-03-24 06:18:59,755 TiSocket    : INFO     wrote 1 bytes to socket: 1
2020-03-24 06:18:59,759 TiSocket    : INFO     wrote 1 bytes to socket: 1
2020-03-24 06:18:59,762 TiSocket    : INFO     wrote 1 bytes to socket: 1
2020-03-24 06:18:59,766 TiSocket    : INFO     wrote 1 bytes to socket: 1
2020-03-24 06:18:59,769 TiSocket    : INFO     wrote 1 bytes to socket: 1
2020-03-24 06:18:59,773 TiSocket    : INFO     wrote 1 bytes to socket: 1
2020-03-24 06:18:59,827 TiSocket    : INFO     wrote 1 bytes to socket: 1
2020-03-24 06:19:01,219 TiSocket    : INFO     closing socket
2020-03-24 06:19:01,220 TiSocket    : INFO     closed socket: 1
2020-03-24 06:19:02,627 ClockFile   : INFO     clock mode:corcomp
2020-03-24 06:19:02,666 ClockFile   : INFO     close special? PI.CLOCK
2020-03-24 08:58:01,535 TiSocket    : INFO     connection socket given handleId 1
2020-03-24 08:58:01,543 ClockFile   : INFO     clock mode:corcomp

 

daemon.log tipi.log

Share this post


Link to post
Share on other sites

Matt,

 

Not sure if you did something special with the services, but with 1.54, I ran for 36 hours without a problem.  I'm running 1.55 now and things look good.

 

Beery

 

Share this post


Link to post
Share on other sites
12 hours ago, BeeryMiller said:

Matt,

 

Not sure if you did something special with the services, but with 1.54, I ran for 36 hours without a problem.  I'm running 1.55 now and things look good.

 

Beery

 

 

I didn't do anything  :) I haven't seen anything in the logs to indicate that anything is actually wrong.. :(

 

I don't even have a good guess. 

 

If I was in your shoes, I'd be inclined to try things like closing the server socket and re-opening it after every hour of non-use... but that's not a solution to a root cause. 

 

[email protected]

Share this post


Link to post
Share on other sites
4 hours ago, jedimatt42 said:

 

I didn't do anything  :) I haven't seen anything in the logs to indicate that anything is actually wrong.. :(

 

I don't even have a good guess. 

 

If I was in your shoes, I'd be inclined to try things like closing the server socket and re-opening it after every hour of non-use... but that's not a solution to a root cause. 

 

[email protected]

Hmmm.  OK.  Has me wondering if an earlier update may have not went as expected, and the latest update self-corrected itself.  Just thinking out loud here.

 

Beery

Share this post


Link to post
Share on other sites
2 hours ago, BeeryMiller said:

Hmmm.  OK.  Has me wondering if an earlier update may have not went as expected, and the latest update self-corrected itself.  Just thinking out loud here.

 

Beery

Not likely. Most uodates don't do much except a 'git pull' and restart the services.  

 

For the service in question, you have restarted it many times.

 

It is more likely an intermittent network loss, or full log file tmpfs (ramdisk) 

 

You do single byte at a time messages, so it is possible, it creates a lot of logs... 

 

Logs are supposed to roll off, but the one time I tried to test that, I learned you shouldn't log at high speed to an SD-card. That is why logs to a tmpfs now.

 

I don't think I ever verified the log roll after that. 

 

Something I can look into.

 

[email protected]

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...

  • Recently Browsing   0 members

    No registered users viewing this page.

×
×
  • Create New...