Effective IT monitoring for a large enterprise (5000+ users)

Alternative Text
by Dmitry Elisov
Oqtacore CTO
791

Table of content

alt

Effective IT monitoring resources.

telegram logo, zabbix logo, c# logo

IT employees in any large company are familiar with the times when all phones just go off. After all, strategically important software does not work and the business process has stopped! System administrators eagerly study the logs and try to figure out what happened. Time passes, passions heat up, reputation falls … Is it possible to somehow avoid such situations? Unfortunately not. But, you can minimize it. In this article, we will consider several tools that help monitor the health of all automated systems in a company.

 

Installing Zabbix monitoring system

A Zabbix-based monitoring system in its simplest form requires one server, which in 99% of cases runs on Linux. For example, we will take Ubuntu Linux on which we will install Apache, MySQL and PHP. To do this, run a series of commands in the terminal:

sudo apt-get update
sudo apt-get install apache2 libapache2-mod-php
sudo apt-get install mysql-server
sudo apt-get install php php-mbstring php-gd php-xml php-bcmath php-ldap php-mysql

 

We connect to the Zabbix repository (example for Ubuntu 18.04):

wget https://repo.zabbix.com/zabbix/4.0/ubuntu/pool/main/z/zabbix-release/zabbix-release_4.0-3+bionic_all.deb
sudo dpkg -i zabbix-release_4.0-3+bionic_all.deb

 

Our system is now ready to install the Zabbix backend. We start the installation process:

sudo apt-get update
sudo apt-get install zabbix-server-mysql zabbix-frontend-php zabbix-agent

 

 

After successfully completing the installation process, we need to create a database to work:

mysql -u root -p

mysql> CREATE DATABASE zabbixdb character set utf8 collate utf8_bin;

mysql> CREATE USER ‘zabbix’@’localhost’ IDENTIFIED BY ‘password’;

mysql> GRANT ALL PRIVILEGES ON zabbixdb.* TO ‘zabbix’@’localhost’ WITH GRANT OPTION;

mysql> FLUSH PRIVILEGES;

 

 

Let’s load the database schema into the newly created database:

cd /usr/share/doc/zabbix-server-mysql
zcat create.sql.gz | mysql -u zabbix -p zabbixdb

 

 

The final step will be editing the configuration file /etc/zabbix/zabbix_server.conf

DBHost=localhost
DBName=zabbixdb
DBUser=zabbix
DBPassword=password

 

 

We completely reboot the server and try to access it from a remote computer at the following address:

http://host_name/zabbix/where host_name is the name or ip address of your Zabbix server. If everything was done correctly, then we will be greeted by the Zabbix Server Configuration Wizard.

After repeatedly clicking on the Next step button, we will complete the installation process and successfully enter the monitoring system under the Administrator.

This article does not claim to be a detailed description of the Zabbix system installation process. If you have any questions – feel free to go to your favorite search engine and find the answers. We have clearly shown that deploying Zabbix is ​​not difficult.

Zabbix is ​​able to monitor the state of a remote computer / server using an agent, and check the availability and response of standard services such as SMTP or HTTP without installing any software. But, we can find out the full picture of the state only with the help of a cross-platform Zabbix agent. It is banal to install it on all servers, even though it is installed quickly. But, it is mandatory to install on strategically important hosts. After all, then we have the opportunity to display information about the processor / memory / disk load, the state of all running processes and much more in real time.

To install the agent on a Debian system, execute the following commands in sequence:

wget https://repo.zabbix.com/zabbix/4.0/debian/pool/main/z/zabbix-release/zabbix-release_4.0-2+stretch_all.deb
dpkg -i zabbix-release_4.0-2+stretch_all.deb
apt update
apt install zabbix-agent
service zabbix-agent start

 

For other operating systems, all commands are described in the Zabbix documentation on the official project site.

The best part about IT monitoring systems is that they can notify you of problems. If the router stops responding, the http protocol is not available, or the disk space of the mail server is 95% full, then you will immediately receive a signal. A signal to help you fix a problem before end users find it. Situations in which to sound the alarm are determined by the system administrator in the ways of creating trigers. Triggers are created by writing intuitive expressions:

{SRV-FS:vfs.fs.size[C:,pfree].last(0)}<1 – SRV-FS has less than 1% free space on the C drive.

{Srv-GDLite:icmpping.max(#3)}=0 – the Srv-GDLite server did not respond to ping 3 times in a row

{FCP:proc.num[chrome.exe, Administrator].last(#1)}=1 – 1 instance of the chrome.exe process is running on the FCP computer as Administrator

A huge number of triggers are included in the system out of the box. The administrator can only choose the methods of notification. What types of notification does Zabbix offer? Visually shows on the monitor, it is possible to send a letter to the mail, or connect an external script that rings a bell or sends an SMS. There are many options.

Writing a Telegram bot for IT monitoring system

Learning to send Zabbix letters is quite simple. But writing on Telegram is more interesting. Let’s write a simple bot that will check mail and forward all messages from Zabbix to Telegram. I chose C # as the programming language, the development environment is Visual Studio. The application is console, we do not need any graphical interface. To accomplish this task, we need the following NuGet packages:

  • AE.NET.Mail – for working with mail.
  • Newtonsoft.JSON – JSON parsing for Telegram.
  • RestSharp – a simple REST and HTTP API client.

Let’s implement a simple Telegram API for receiving / sending messages.

using RestSharp;




namespace BotZabbix

{

class TelegramAPI

{

const string API_URL = "https://api.telegram.org/bot" + AppSettings.BotToken + "/";




public string sendAPIRequest(string _apiMethod, string _params)

{

RestClient RC = new RestClient();

var Url = API_URL + _apiMethod + "?" + _params;

var Request = new RestRequest(Url);

var Response = RC.Get(Request);




return Response.Content;

}

}




public class APIResult

{

public Update[] Result { get; set; }

}




public class Update

{

public int update_id { get; set; }

public Message message { get; set; }

}




public class Message

{

public Chat chat { get; set; }

public string text { get; set; }

}




public class Chat

{

public int id { get; set; }

public string first_name { get; set; }

public string last_name { get; set; }

}

}

 

 

Here AppSettings.BotToken is the token that Telegram issues when creating a bot.

If you call the sendAPIRequest method with the following parameters:

sendAPIRequest(“sendMessage”, $”chat_id={_idChat}&text={BotHelpers.CheckText(_text)}”)

 

then we will send a message to the user with id = _idChat and the text stored in the _text variable. To receive messages, you need to call the above method as follows:

sendAPIRequest(“getUpdates”, $”offset={lastUpdateId}”)

 

where lastUpdateId is the id of the last received message.

To implement the process of receiving email, we will write the following class:

 

using System;

using System.Collections.Generic;

 

namespace BotZabbix

{

class Email

{

public List<string> GetMail()

{

int MailCount = 0;

List<string> Mails = new List<string>();

 

try

{

using (AE.Net.Mail.ImapClient ic = new       AE.Net.Mail.ImapClient(AppSettings.MailServer,

AppSettings.MailUser,

AppSettings.MailPassword, AE.Net.Mail.AuthMethods.Login,

AppSettings.MailServerPort, true))

{

ic.SelectMailbox("INBOX");

AE.Net.Mail.MailMessage[] mm = ic.GetMessages(0, 50, false);

 

foreach (AE.Net.Mail.MailMessage m in mm)

{

MailCount++;

if (m.From.ToString().Contains("Zabbix"))

{

Mails.Add(m.Body);

ic.DeleteMessage(m);

}

else

{

Logger.Log(m.From.ToString());

Logger.Log(m.Body.ToString());

ic.DeleteMessage(m);

}

}

Logger.Log($"::: {DateTime.Now} Mail count: {MailCount}");

ic.Dispose();

}

}

catch (Exception _ex)

{

Logger.Log($"::: {DateTime.Now} Error: " + _ex.Message);

}

 

return Mails;

}

}

}

 

As you can see from the code, the GetMail () method connects to a specific mailbox, searches the Inbox folder for messages from the sender “Zabbix”, returns them as a result of its work and deletes them from the mailbox. The body of the bot receives letters in the Mails variable and sends them to everyone interested in Telegram.

Only the flight of thought and imagination decides what functionality to give the bot. In my case, the bot lets it in only with a password, logs all its activities, and can receive the ping command for a detailed analysis of the network state.

The full version of the bot can be viewed on github: https://github.com/ishmuratov/ZabbixBot.

 

Writing a ping tool for IT monitoring system

What do we end up with? An IT monitoring system that controls all the most important parameters of the company’s servers and a notification system by email and Telegram. But what if the Zabbix server itself fails? He himself will not be able to write a letter about his death … C # comes to our aid again. We arm ourselves with Visual Studio and write a small utility called PingTool, which can poll hosts according to the list cyclically. If someone does not answer for some time, then a letter is sent on behalf of Zabbix, which is immediately duplicated in Telegram by the above bot.

To implement this task, we need to write a method for sending a letter to e’mail and program an analogue of the ping command.

 

using System.Net.NetworkInformation;

class Pinger

{

public static bool PingHost(string nameOrAddress)

{

bool pingable = false;

Ping pinger = null;

 

try

{

pinger = new Ping();

PingReply reply = pinger.Send(nameOrAddress);

pingable = reply.Status == IPStatus.Success;

}

catch (PingException)

{

return false;

}

finally

{

if (pinger != null)

{

pinger.Dispose();

}

}

 

return pingable;

}

 

}

 

If host nameOrAddress responds, then the result of the PingHost method of the Pinger class will be true. Otherwise, false. If for a certain period of time (for example, 10 seconds) some address does not respond, then it is marked as inaccessible, which is the reason for sending a letter.

 

class MailSender

{

public MailSend(string _fromAddress, string _fromPassword, string _toAddress, Letter _letter)

{

var fromMail = new MailAddress(_fromAddress, “Zabbix”);

var toMail = new MailAddress(_toAddress, string.Empty);

var smtp = new SmtpClient

{

Host = “smtp.gmail.com”,

Port = 587,

EnableSsl = true,

DeliveryMethod = SmtpDeliveryMethod.Network,

UseDefaultCredentials = false,

Credentials = new NetworkCredential(fromMail.Address, _fromPassword)

};

using (var message = new MailMessage(fromMail, toMail)

{

Subject = _letter.Subject,

Body = _letter.Body

})

{

try

{

smtp.Send(message);

}

catch (Exception ex)

{

Logger.Log($”::: {DateTime.Now} Error: ” + _ex.Message);

}

}

}

}

 

The MailSend () method sends an email from the _fromAdress address to the _toAdress address. The method was written for Gmail, but it is easy to rewrite it for another mail service by changing the appropriate settings of the SmtpClient instance.

 

class Letter

{

public string Subject { get; set; }

public string Body { get; set; }

}

 

The Letter class contains only 2 text properties Subject and Body, which store the subject and body of the letter, respectively.

The full version of the program can be viewed on github: https://github.com/ishmuratov/PingTool.

After lengthy trials, our company came to the following solution: deploy a Zabbix server in a cloud data center that guarantees 99% uptime and has access to all the company’s servers. The Telegram bot is located in the Google cloud, which guarantees 99.9% uptime and Internet access. The PingTool utility has been launched at the head office and is capable of sending letters via the 3G communication channel of the country’s largest cellular operator. Thanks to the IT monitoring system, we were able to significantly increase the fault tolerance of all critical nodes and provide business with high efficiency due to the continuous operation of equipment and software.

 

 

Rate this article
Please wait...