+jedimatt42 Posted March 22, 2020 Author Share Posted March 22, 2020 On 3/14/2020 at 11:01 AM, BeeryMiller said: I did see this in the daemon.log file, but I have slept since, and I thought the connection issue hit after I left for work which would have been 6:30 EST, but it looks like the log file may be off an hour. Mar 13 14:59:22 tipi3 systemd[1]: tipi.service: Main process exited, code=killed , status=15/TERM Mar 13 14:59:22 tipi3 systemd[1]: tipi.service: Succeeded. Mar 13 14:59:22 tipi3 systemd[1]: Stopped TI-99/4A DSR Service. Mar 13 14:59:22 tipi3 systemd[1]: Started TI-99/4A DSR Service. Mar 13 14:59:22 tipi3 systemd[1]: Stopping TI-99/4A DSR Service... Mar 13 14:59:22 tipi3 systemd[1]: tipi.service: Main process exited, code=killed , status=15/TERM Mar 13 14:59:22 tipi3 systemd[1]: tipi.service: Succeeded. Mar 13 14:59:22 tipi3 systemd[1]: Stopped TI-99/4A DSR Service. Mar 13 14:59:22 tipi3 systemd[1]: Started TI-99/4A DSR Service. Mar 13 14:59:23 tipi3 systemd[1]: Stopping TI-99/4A DSR Service... Mar 13 14:59:23 tipi3 systemd[1]: tipi.service: Main process exited, code=killed , status=15/TERM Mar 13 14:59:23 tipi3 systemd[1]: tipi.service: Succeeded. Mar 13 14:59:23 tipi3 systemd[1]: Stopped TI-99/4A DSR Service. Mar 13 14:59:23 tipi3 systemd[1]: tipi.service: Start request repeated too quick ly. Mar 13 14:59:23 tipi3 systemd[1]: tipi.service: Failed with result 'start-limit- hit'. Mar 13 14:59:23 tipi3 systemd[1]: Failed to start TI-99/4A DSR Service. Mar 13 14:59:23 tipi3 tipiwatchdog.sh[586]: Job for tipi.service failed. Mar 13 14:59:23 tipi3 tipiwatchdog.sh[586]: See "systemctl status tipi.service" and "journalctl -xe" for details. Mar 13 14:59:23 tipi3 systemd[1]: tipi.service: Start request repeated too quick ly. Mar 13 14:59:23 tipi3 systemd[1]: tipi.service: Failed with result 'start-limit- hit'. Mar 13 14:59:23 tipi3 systemd[1]: Failed to start TI-99/4A DSR Service. Mar 13 14:59:23 tipi3 tipiwatchdog.sh[586]: Job for tipi.service failed. daemon.log 418.78 kB · 1 download tipi.log 2.87 MB · 2 downloads tipi.log.1 4.88 MB · 1 download So, at the same time as this, in the tipi.log, it looks like the service was being reset a bunch of times in a row... a little reset storm... there are no statements of error... just the watchdog service killing the tipi.service when the 4A sends the tipi-reset signal via cru. This usually only happens when a 4A resets to title screen. Do you @BeeryMiller have code that is automating this reset via cru that might have gone haywire? -M@ Quote Link to comment Share on other sites More sharing options...
+9640News Posted March 22, 2020 Share Posted March 22, 2020 12 hours ago, jedimatt42 said: So, at the same time as this, in the tipi.log, it looks like the service was being reset a bunch of times in a row... a little reset storm... there are no statements of error... just the watchdog service killing the tipi.service when the 4A sends the tipi-reset signal via cru. This usually only happens when a 4A resets to title screen. Do you @BeeryMiller have code that is automating this reset via cru that might have gone haywire? -M@ I went through and reviewed my code. When I first load and initialize the program, I do a reset of the TIPI. Afterwards, the reset of the TIPI does not take place again and that initialization process is not run again. When a user hangs up, or is disconnected for being idle (3 minutes as programmed), then the port is closed. I then jump into a loop sending a message of >22,>00,>07 looking to accept a connection. I copied your send/recv msg code from the DSR into my code as running the code from the DSR itself must have been a timing issue with the Geneve as it would hang up in MDOS mode. I've been using that same code for the past year or so, so no reason to suspect the actual send/recv code. Two thing come to my mind. First, I can comment out the call that resets the TIPI. I will do that in the next few minutes and see how things play out. I think that code worked its way into the program when there was an issue back some time ago. That may have been addressed with the 1.50 or 1.51 updates. The only other thing that comes to my mind, is if the TIPI Accept command is doing some kind of reset possibly somewhere of the TIPI DSR service???? Beery Quote Link to comment Share on other sites More sharing options...
+9640News Posted March 22, 2020 Share Posted March 22, 2020 18 hours ago, jedimatt42 said: So, at the same time as this, in the tipi.log, it looks like the service was being reset a bunch of times in a row... a little reset storm... there are no statements of error... just the watchdog service killing the tipi.service when the 4A sends the tipi-reset signal via cru. This usually only happens when a 4A resets to title screen. Do you @BeeryMiller have code that is automating this reset via cru that might have gone haywire? -M@ I've got my code up and running. So far, no issues after about 8 logins and timeouts. Will know tomorrow to know how the app runs. Beery 3 Quote Link to comment Share on other sites More sharing options...
+9640News Posted March 23, 2020 Share Posted March 23, 2020 Just a FYI, removing the Reset routine did not solve the issue. Beery 1 Quote Link to comment Share on other sites More sharing options...
+jedimatt42 Posted March 23, 2020 Author Share Posted March 23, 2020 You'll find code in TIPI messaging called RESET, but that is handshaking code. Not signal level reset. The constant in messaging called RESET should be renamed HELLO. The *RESET signal that triggers what we see in the logs, of process restarts, is an actual wire from the CPLD to a PI GPIO input. I probably should have put an external pull-up resister on it, but there is an internal enabled at the receiving GPIO pin, inside the PI processor. The code that triggers this, in all the code I have written, is only in the 4A powerup routine of my DSR. Maybe there was a compound issue? Do the logs show the same reset storm at the time of failure? -M@ Quote Link to comment Share on other sites More sharing options...
+9640News Posted March 23, 2020 Share Posted March 23, 2020 2 hours ago, jedimatt42 said: You'll find code in TIPI messaging called RESET, but that is handshaking code. Not signal level reset. The constant in messaging called RESET should be renamed HELLO. The *RESET signal that triggers what we see in the logs, of process restarts, is an actual wire from the CPLD to a PI GPIO input. I probably should have put an external pull-up resister on it, but there is an internal enabled at the receiving GPIO pin, inside the PI processor. The code that triggers this, in all the code I have written, is only in the 4A powerup routine of my DSR. Maybe there was a compound issue? Do the logs show the same reset storm at the time of failure? -M@ The *RESET signal you are referencing in the 4A Powerup code is what I borrowed from your DSR code back quite some time ago. I do not recall the circumstances, but there was some reason I needed it, or thought I needed it, from some behavior I was experiencing. If I recall correctly, I think I was using it to force a socket disconnect from another system. I will need to do a reboot of the PI and repeat the run to see when I get the issue and can review the logs. When I get more details, I will follow-up. Beery 3 Quote Link to comment Share on other sites More sharing options...
+9640News Posted March 24, 2020 Share Posted March 24, 2020 Matt, I have attached two files. After about 9 hours, is when access was lost. In the tipi.log file which is at the very end, there is a break in time in the log file. Around 3/24 at 06:18, I logged into the BBS to check it's status. Later, from my workplace, I tried to connect and there was no connection. In the daemon.log file, not sure what all that means after the 06:18 when I logged in and then out. From tipi.log.... 2020-03-24 06:18:59,737 TiSocket : INFO wrote 1 bytes to socket: 1 2020-03-24 06:18:59,741 TiSocket : INFO wrote 1 bytes to socket: 1 2020-03-24 06:18:59,744 TiSocket : INFO wrote 1 bytes to socket: 1 2020-03-24 06:18:59,748 TiSocket : INFO wrote 1 bytes to socket: 1 2020-03-24 06:18:59,751 TiSocket : INFO wrote 1 bytes to socket: 1 2020-03-24 06:18:59,755 TiSocket : INFO wrote 1 bytes to socket: 1 2020-03-24 06:18:59,759 TiSocket : INFO wrote 1 bytes to socket: 1 2020-03-24 06:18:59,762 TiSocket : INFO wrote 1 bytes to socket: 1 2020-03-24 06:18:59,766 TiSocket : INFO wrote 1 bytes to socket: 1 2020-03-24 06:18:59,769 TiSocket : INFO wrote 1 bytes to socket: 1 2020-03-24 06:18:59,773 TiSocket : INFO wrote 1 bytes to socket: 1 2020-03-24 06:18:59,827 TiSocket : INFO wrote 1 bytes to socket: 1 2020-03-24 06:19:01,219 TiSocket : INFO closing socket 2020-03-24 06:19:01,220 TiSocket : INFO closed socket: 1 2020-03-24 06:19:02,627 ClockFile : INFO clock mode:corcomp 2020-03-24 06:19:02,666 ClockFile : INFO close special? PI.CLOCK 2020-03-24 08:58:01,535 TiSocket : INFO connection socket given handleId 1 2020-03-24 08:58:01,543 ClockFile : INFO clock mode:corcomp daemon.log tipi.log Quote Link to comment Share on other sites More sharing options...
+9640News Posted April 6, 2020 Share Posted April 6, 2020 Matt, Not sure if you did something special with the services, but with 1.54, I ran for 36 hours without a problem. I'm running 1.55 now and things look good. Beery 1 Quote Link to comment Share on other sites More sharing options...
+jedimatt42 Posted April 7, 2020 Author Share Posted April 7, 2020 12 hours ago, BeeryMiller said: Matt, Not sure if you did something special with the services, but with 1.54, I ran for 36 hours without a problem. I'm running 1.55 now and things look good. Beery I didn't do anything I haven't seen anything in the logs to indicate that anything is actually wrong.. I don't even have a good guess. If I was in your shoes, I'd be inclined to try things like closing the server socket and re-opening it after every hour of non-use... but that's not a solution to a root cause. -M@ Quote Link to comment Share on other sites More sharing options...
+9640News Posted April 7, 2020 Share Posted April 7, 2020 4 hours ago, jedimatt42 said: I didn't do anything I haven't seen anything in the logs to indicate that anything is actually wrong.. I don't even have a good guess. If I was in your shoes, I'd be inclined to try things like closing the server socket and re-opening it after every hour of non-use... but that's not a solution to a root cause. -M@ Hmmm. OK. Has me wondering if an earlier update may have not went as expected, and the latest update self-corrected itself. Just thinking out loud here. Beery Quote Link to comment Share on other sites More sharing options...
+jedimatt42 Posted April 7, 2020 Author Share Posted April 7, 2020 2 hours ago, BeeryMiller said: Hmmm. OK. Has me wondering if an earlier update may have not went as expected, and the latest update self-corrected itself. Just thinking out loud here. Beery Not likely. Most uodates don't do much except a 'git pull' and restart the services. For the service in question, you have restarted it many times. It is more likely an intermittent network loss, or full log file tmpfs (ramdisk) You do single byte at a time messages, so it is possible, it creates a lot of logs... Logs are supposed to roll off, but the one time I tried to test that, I learned you shouldn't log at high speed to an SD-card. That is why logs to a tmpfs now. I don't think I ever verified the log roll after that. Something I can look into. -M@ 1 Quote Link to comment Share on other sites More sharing options...
+9640News Posted April 9, 2020 Share Posted April 9, 2020 On 4/7/2020 at 10:57 AM, jedimatt42 said: Not likely. Most uodates don't do much except a 'git pull' and restart the services. For the service in question, you have restarted it many times. It is more likely an intermittent network loss, or full log file tmpfs (ramdisk) You do single byte at a time messages, so it is possible, it creates a lot of logs... Logs are supposed to roll off, but the one time I tried to test that, I learned you shouldn't log at high speed to an SD-card. That is why logs to a tmpfs now. I don't think I ever verified the log roll after that. Something I can look into. -M@ Well, yesterday, after the system had been running for about 5 days, the BBS crashed where it was waiting for input from a socket. This was about the time I had an internet connection failure. I did capture the log files, just ran out of time yesterday with some other priorities to post things. If you want to see the files, I will post this evening, otherwise, I won't tie up space here on the forum. Not sure what size one can max out on the tmpfs if that is a setting you defined. If it were a tmpfs issue, could a flash drive be plugged into the PI so that it avoids the SD card issue or do you think the logging feature would be just too much? I guess I could ask is there a possibility to turn off some of the logging as a test if you think that is an avenue worth exploring? Now, the intermittent network loss issue you also mention, I was having major problems yesterday with my AT&T service provider. I've got a service call for them to be out today as there are line issues. The network connection to the internet was up and down every 10 minutes. Service tech had me pull the power on the router, and rebooted the router, plus they did some of their diagnostics on their end. Anyways, just some feedback. I do realize my coding to use the TIPI as a BBS server is likely stress testing the system further than any other code at the moment. If it turns out to not be feasible, then I will know. Beery Quote Link to comment Share on other sites More sharing options...
+jedimatt42 Posted April 9, 2020 Author Share Posted April 9, 2020 31 minutes ago, BeeryMiller said: Well, yesterday, after the system had been running for about 5 days, the BBS crashed where it was waiting for input from a socket. This was about the time I had an internet connection failure. I did capture the log files, just ran out of time yesterday with some other priorities to post things. If you want to see the files, I will post this evening, otherwise, I won't tie up space here on the forum. Not sure what size one can max out on the tmpfs if that is a setting you defined. If it were a tmpfs issue, could a flash drive be plugged into the PI so that it avoids the SD card issue or do you think the logging feature would be just too much? I guess I could ask is there a possibility to turn off some of the logging as a test if you think that is an avenue worth exploring? Now, the intermittent network loss issue you also mention, I was having major problems yesterday with my AT&T service provider. I've got a service call for them to be out today as there are line issues. The network connection to the internet was up and down every 10 minutes. Service tech had me pull the power on the router, and rebooted the router, plus they did some of their diagnostics on their end. Anyways, just some feedback. I do realize my coding to use the TIPI as a BBS server is likely stress testing the system further than any other code at the moment. If it turns out to not be feasible, then I will know. Beery The correct thing to do is for me to make sure the logs never exceed the allocated space. It was either 30meg or 100meg... I will squeeze that in today or tomorrow. -M@ 2 Quote Link to comment Share on other sites More sharing options...
+jedimatt42 Posted April 9, 2020 Author Share Posted April 9, 2020 I verified that the log rolling is working... as it is configured presently: /var/log == 100 MB tmpfs /var/log/tipi/tipi.log -> rolls at 5 MB, with a max of 5 backup files, so total max space consumed: 30 MB The rest of the system logs are also on the system, but they appear to be using less than 10 meg in total, and log rolling for them is handled by the operating system. ---- Things you could try : Ethernet... Use the Ethernet port on your PI instead of the Wifi... Ethernet doesn't go-away out from under software the way that WiFi does... Build a watchdog service... I imagine something like a dead-man switch for the BBS... a script running on the PI that if network access from the PI is lost, it drops a file in /home/tipi/tipi_disk/... that the 4A software can then periodically read, and perform appropriate resets... I will need more time ( a larger time window ) to setup network failure testing, and make the TIPI software more robust against intermittent failures for hosting server sockets. No idea if I can succeed at this... or when. -M@ 1 Quote Link to comment Share on other sites More sharing options...
+9640News Posted April 9, 2020 Share Posted April 9, 2020 1 hour ago, jedimatt42 said: Ethernet... Use the Ethernet port on your PI instead of the Wifi... Ethernet doesn't go-away out from under software the way that WiFi does... Build a watchdog service... I imagine something like a dead-man switch for the BBS... a script running on the PI that if network access from the PI is lost, it drops a file in /home/tipi/tipi_disk/... that the 4A software can then periodically read, and perform appropriate resets... I will need more time ( a larger time window ) to setup network failure testing, and make the TIPI software more robust against intermittent failures for hosting server sockets. No idea if I can succeed at this... or when. -M@ Absolutely no problem Matt, and thanks for the suggestions. Work on the things that matter to you. It will be relatively easy to connect the sidecar PI to the ethernet connection. As far as a dead-man switch for the BBS, I know what you are describing, but I will have to research things on how that could even be accomplished. That's my problem to figure out. If a monitor is plugged into the Raspberry PI 4+, does that have any impact on other services? The reason I ask is that I plugged a HDMI monitor to it, but the monitor did not pick up any signal until the PI was rebooted when I was looking at things yesterday. Didn't know if that would have slowed down the capabilities of the PI or not. 1 Quote Link to comment Share on other sites More sharing options...
+arcadeshopper Posted April 9, 2020 Share Posted April 9, 2020 Shouldn't have done anything to the pi to plug in a monitor.. at least in my network I have three pis here and they all lose their Wi-Fi connection sooner or later and have to be restarted either restarting the Wi-Fi or restarting the pi completely. It could be my environment which is very RF saturated... Anyway when one of my pis is plugged in with ethernet it doesn't lose connection.. I wouldn't trust Wi-Fi connections or anything that you want to have reliable.Sent from my LM-G820 using Tapatalk 1 Quote Link to comment Share on other sites More sharing options...
+9640News Posted April 9, 2020 Share Posted April 9, 2020 30 minutes ago, arcadeshopper said: Shouldn't have done anything to the pi to plug in a monitor.. at least in my network I have three pis here and they all lose their Wi-Fi connection sooner or later and have to be restarted either restarting the Wi-Fi or restarting the pi completely. It could be my environment which is very RF saturated... Anyway when one of my pis is plugged in with ethernet it doesn't lose connection.. I wouldn't trust Wi-Fi connections or anything that you want to have reliable. Sent from my LM-G820 using Tapatalk Good to know. AT&T was out and replaced some connections, and I will switch from WiFi to Ethernet and restart the software. He even left me an extra new router at the house at no charge. I suspect my environment is very RF saturated as well. Beery 2 Quote Link to comment Share on other sites More sharing options...
+9640News Posted April 16, 2020 Share Posted April 16, 2020 Matt, As a FYI, the ethernet cable did not solve the issue of the PI losing connection. I was thinking about how to restart things on the PI with your watchdog. At this point, I am not sure how this might be accomplished. One thing I do know is that the Re(B)oot option of TIPICFG does fix the issue when it hits. I don't see the source for TIPICFG on GitHub, so I am not able to extract the routine from it that accomplished that capability. I assume it is not the same routine in the TIPI DSR that does the powerup. Can you share that routine? If so, I can have the PI periodically reboot if there is not a user connected. I don't know if there is any correlation, but it actually seems the ethernet connection is having more issues than the WIFI connection. I'm losing the Ethernet connection multiple times daily and I am pretty sure it dropped multiple times even though my internet connection did not drop during the time interval in question. Beery Quote Link to comment Share on other sites More sharing options...
+jedimatt42 Posted April 16, 2020 Author Share Posted April 16, 2020 25 minutes ago, BeeryMiller said: Matt, As a FYI, the ethernet cable did not solve the issue of the PI losing connection. I was thinking about how to restart things on the PI with your watchdog. At this point, I am not sure how this might be accomplished. One thing I do know is that the Re(B)oot option of TIPICFG does fix the issue when it hits. I don't see the source for TIPICFG on GitHub, so I am not able to extract the routine from it that accomplished that capability. I assume it is not the same routine in the TIPI DSR that does the powerup. Can you share that routine? If so, I can have the PI periodically reboot if there is not a user connected. I don't know if there is any correlation, but it actually seems the ethernet connection is having more issues than the WIFI connection. I'm losing the Ethernet connection multiple times daily and I am pretty sure it dropped multiple times even though my internet connection did not drop during the time interval in question. Beery Source for tipicfg is in the TIPI repo, under clients/tipicfg. But to reboot the PI from the TI, open for output, PI.REBOOT, and close the file. To tell that the reboot is done, wait a little bit.. 10 seconds probably, then try to read a File... PI.STATUS if you like. If the reboot is not done it will block until it completed... -M@ 3 Quote Link to comment Share on other sites More sharing options...
+9640News Posted April 16, 2020 Share Posted April 16, 2020 8 hours ago, jedimatt42 said: Source for tipicfg is in the TIPI repo, under clients/tipicfg. But to reboot the PI from the TI, open for output, PI.REBOOT, and close the file. To tell that the reboot is done, wait a little bit.. 10 seconds probably, then try to read a File... PI.STATUS if you like. If the reboot is not done it will block until it completed... -M@ Thanks. I saw the open of the PI.REBOOT, but thought there was more to it than that with something missing. Hopefully, I have time to code that tonight. Beery Quote Link to comment Share on other sites More sharing options...
+9640News Posted April 16, 2020 Share Posted April 16, 2020 Going to post this as a reference for others that may be considering any server functionality on the PI/TIPI. This time, I had an appropriate set of keywords with Google of "raspberry pi detecting a loss of internet" and came up with this found at https://weworkweplay.com/play/rebooting-the-raspberry-pi-when-it-loses-wireless-connection-wifi/ This looks to be more simple to implement and does not require any TI-99/4A CPU cycles to do a PI.REBOOT, however as I think about it more, I don't think it is going to be perfect either "as is". I think the CRON job on the PI is going to need to be a shorter interval of 5 minutes. If timing hit just right, I think my router can re-establish an internet connection in < 5 minutes, so it may need to run once every minute. I know if the BBS is in server mode waiting for a connection, one will not be found for some "cause" to create the incoming link to the port. Right now, the presumption is it may be related to dropped internet connection. If that is the case, then this CRON job will/should be sufficient. I know I can open an outbound socket as TIPICFG does read some stats when it is loaded after the server side has decided to not respond. As the webpage above describes WIFI connectivity, I think what may be really needed is to ping an IP address somewhere else on the internet if running ethernet to the PI. I should note that probably only someone running something in a server type environment should consider this CRON job for the TI-99/4A or Geneve. Otherwise, someone using the TIPI predominantly for file storage may suddenly discover their PI is rebooting when they are in the middle of file access, etc. Beery 1 Quote Link to comment Share on other sites More sharing options...
+arcadeshopper Posted April 16, 2020 Share Posted April 16, 2020 I was going to say maybe rebooting the pi was overkill and just restarting the TIPI process was a better ideaSent from my LM-G820 using Tapatalk Quote Link to comment Share on other sites More sharing options...
+9640News Posted April 17, 2020 Share Posted April 17, 2020 (edited) 2 hours ago, arcadeshopper said: I was going to say maybe rebooting the pi was overkill and just restarting the TIPI process was a better idea Sent from my LM-G820 using Tapatalk Let's say I have this script running: ping -c4 www.google.com > /dev/null if [ $? != 0 ] then sudo ????????? fi I am assuming I would use the sudo command, but what do I place after the sudo to restart the TIPI service? I chose to use www.google.com as that is a website very unlikely to ever go down whereas the example I gave above with the link used a router ip address. For others, I am anticipating to put that script into a crontab so that it runs every minute. Beery Edited April 17, 2020 by BeeryMiller Quote Link to comment Share on other sites More sharing options...
+retroclouds Posted April 17, 2020 Share Posted April 17, 2020 5 hours ago, BeeryMiller said: Let's say I have this script running: ping -c4 www.google.com > /dev/null if [ $? != 0 ] then sudo ????????? fi I am assuming I would use the sudo command, but what do I place after the sudo to restart the TIPI service? I chose to use www.google.com as that is a website very unlikely to ever go down whereas the example I gave above with the link used a router ip address. For others, I am anticipating to put that script into a crontab so that it runs every minute. Beery guess that would be: systemctl restart tipi.service To check the status: systemctl status tipi.service Note: instead of cron you could also setup a systemd timer, but that is just a matter of taste I‘d say. 1 Quote Link to comment Share on other sites More sharing options...
+9640News Posted April 17, 2020 Share Posted April 17, 2020 Thanks for the info. The BBS right now is up and running, so when it loses the ability to pick up a connection, I am going to exit the program, and run telnet and login into the system to check status, then restart, check status, then reload the BBS program to confirm restarting the TIPI service did the trick. Beery 2 Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.