Grid as the fourth stage of informatization development

(The  article, being published  and discusstd in newspaper  «Зеркало недели - Mirror of a Week» №8, 2007)

In the nearest decades, personal computers, servers, local nets and the other things habitual to present specialists can disappear from use because computing and information service will turn into such facilities, as electrics and water supply now; and separate computers with multi-cores processors will dissolve in global information Grid infrastructure. In the beginning Grid technologies were destined for solving complicated scientific and engineer tasks which cannot be solved in reasonable time on separate computing devices. But field of application of Grid technologies doesn’t terminate only with these task types. With its development, Grid gets in industry and business, claiming to be the universal infrastructure for data processing, with a great number of services functioning in it (Grid Services), which not only permit to solve concrete applied tasks but also propose service on search of necessary resources, collecting information about the state of resources, data storage and delivery.

Informatization now entered the fourth stage of its development. The first one was connected with appearance of large computers (mainframes), the second one, with creation of personal computers, the third one, with appearance of Internet which united the users into the single information space by the means of common access to information. With beginning of 21st century began transition to new Grid technologies when already habitual Internet with its web service is being replaced by world Grid net as the means of common use of computational power and data storages. Grid permits to pass the limit of simple data exchange between the computers and in the course of time to turn their global net into gigantic virtual computer in its kind, available in distant access mode from any point, independently from user’s location.

It should be admitted that Grid ideas are not very wide known for now. But not so long ago (only eight—ten years ago) Internet and Web were also known only to narrow range of specialists. But in 2006 the number of Internet users exceeded one billion. Perhaps, in spite of the wish, it is hard to find a man who didn’t at least hear these words. There are reasons to consider, that in the course of time Grid will be as popular, too. Its present condition can be compared with Internet of 1997 year ‘pattern’, and we can admit that Grid potential and growth speed isn’t at any rate lower than they were then.

In literal translation Grid means “grating”. Agree that associations connected with this word in our language in no way correspond the meaning of free computer cooperation for high-productive computations, put in Grid technologies. The closest in meaning is, maybe, power grid, power of the net, distributed resource of common use, when everyone can easily connect socket and use any needed amount of electric energy. Analogically users by the help of Grid get the possibility of direct connection to remote computing net, not being interested in the question where computing resources and data necessary for work come from, which electric power transmission lines, passwords or protocols are used for this. At this, analogue of electrical net infrastructure (electric power transmission lines, substations, transformers, etc.) is Grid, intermediate program layer, or MiddleWare.

What Grid gives to scientists

Formally the authors of Grid conception are considered to be Ian Foster from Aragon National Laboratory of University of Chicago and Charles Kesselman from Institute of Informatics of University of South California. It were they who in 1998 for the first time proposed the term ‘Grid computing’  for the definition of universal hardware/software infrastructure which unites computers and supercomputers into territorially distributed information-computing system. After their definition, this has already become classic, ‘Grid is agreed, opened and standardized environment which provides flexible, safe, coordinated distribution of resources within virtual organization’. Usually the word ‘computing’ or ‘metacomputing’ is used there where systems of the highest level are built on the basis of separate computers. One can quite get used to this word (for automobilists got used to the word ‘tuning’, ecologists, to ‘monitoring’, sportsmen, to ‘diving’, and we together, to ‘shopping’). By the way, the word ‘computer’ itself also was introducing our language not without difficulties, having replaced combination of words ‘electronic computing device’, which is hard to translate. Thus, the word ‘computing’ too, as we can hope, will replace equivalent to it phrase ‘service on computing performance or data processing on computer’.
Grid computing is the new class of infrastructures, in which from remote resources it is built safe and scaling computing mechanism as part of computers, from tabletop ones and to supercomputers, program packets and input/output devices. At the root of Grid lie program technologies, which use new standards and protocols together with well-known net and Internet protocols. The time will show whether the name Grid is worth to be written in Cyrillic.
The idea of more effective use of computational powers by the means of uniting the great amount of computers into the single structure appeared in scientific society rather long ago, in the era of large computers. Even in 80-ies scientists (in the first turn, nuclear scientists) to solve complicated mathematical tasks were trying to unite resources of separate workstations and to use free resources of central processors to reduce the time of their data processing. The usual way of development of computing nets in organization is approximately the same. At first, small user group, which executes scientific or engineer research decides to unite its resources on the basis of simple rules and agreements. It can be easily made on the basis of software that is freely spread. The successful experience catches on, and soon the other user groups choose the same way. The number of such groups increases, and quite legal wish to exchange resources, filling free computational power, appears in them. Here it is already hard to be limited to simple agreements, it is necessary to introduce some technical means of account and ‘inter-settlements’.
Technology of controlling the distributed resources is one of the most important tasks. It is in the first turn aimed at providing information infrastructure control in conditions of load increment and increasing the number of net components. Work principles of the task control system are well known: there are waiting line, search of free resources, scheduling, politics and priorities. The net task control system was realized rather long ago, but the use of Grid technologies permits to build the system of controlling the distributed computing resources. In such situation it isn’t already important for user on which concrete net node his task is being performed; he or she simply consumes certain amount of virtual processor capacity, which is present in the net.
There are several reasons that encourage scientists to use Grid technologies.
First, often it is necessary to process the enormous data amount that is stored in different organizations (perhaps, situated in different parts of the world). The example can be the task of processing the Earth photos received from satellites.
Second, it is necessary to perform the enormous amount of computations during the research. For instance, when modeling the influence of thousands of molecules (of potential medicinal preparations) on albumens while searching medicine to cure certain disease.
Third, scientific team with members working in different corners of the world wants to use together large data stores, to perform their complex analysis quickly and interactively, to visualize and to discuss results on-line.
It is understandable that tasks being solved are of great importance for different fundamental scientific research and project works. Among such tasks there can be named exploration of evolution of protoplanet material, planets and the Earth; general meteorological forecast and prognosis of different natural disasters (tsunami, earthquakes, volcanic explosions); modeling and analysis of experiments in nuclear physics; research in nanotechnology sphere, design of aerocosmic devices and automobiles, DNA decoding and identification of proteins. Definitely, soon it will be easier to name the scientific science discipline that doesn’t use supercomputers and distributed computing. Among the key factors which favor Grid introducing, is not only the possibility of infrastructure flexible adaptation to new demands, but also increasing of effectiveness of use present computing and human resources, because, while working together on different projects, specialists use one and the same infrastructure.

Grid in the world

Let us confine ourselves only to citation of the most famous Grid projects already performed during the last several years or such, which are now in realization state. In 2001 year in the USA started TeraGrid project, financed by National Scientific Fund, the main task of which became creating the distributed infrastructure for high-productive computing. In May 2004, European Union created analogue of American TeraGrid, DEISA consortium, partly financed within the 6th Frame program, which united the pioneering national supercomputer centres of EU into Grid net. In the end of March of 2004 year, there finished the three-year European DataGrid project, within which there was built test infrastructure of computing and data exchange for needs of European scientific society. On the basis of these developments, there was begun the new international project of creating the high-productive scientific Grid net EGEE (Enabling Grids for E-science), which is carried out under the direction of Swiss ENRC (European Centre of Nuclear Research, Geneva) and is financed by European Union and governments of participant countries. For now, project includes 70 scientific establishments from 27 countries of the world. Within this project there must be built the largest Grid in the world with summary computational power 20 000 powerful processors.

The leading role of CERN is defined with that in 2007 there is planned the start of the largest in the world accelerator of particles (LHC, Large Handron Collider), which will be the source of enormous information volume. The new computer infrastructure which is created in the first turn under LHC, must provide effective information processing, expected annual average volume of which is estimated at 10 Pbytes (1 Pbyte ~1000 Tbytes). But the task of EGEE isn’t limited to nuclear physics and has to realize the Grid potential for many other technical-scientific branches, too. For example, the nearest plans of project command provide for creating the separate bioinformation ‘Grid block’.

In close interaction with EGEE project there is being developed also the main European net for education and science, GEANT. In the middle of last year intergovernmental organization DANTE announced the start of research-educational system of new generation GEANT 2, which comprises 3 mln users from 3.5 thousand academic establishments situated in 34 European countries. The new net with qualitatively replace processing the information of radioastronomic complexes, register systems of which are situated on considerable distance, and also will serve CERN, transferring data after LHC start. In 2005 European Commission prepared the special program, which costs 13 bln euro, within which Grid computing has the role of stimulator and specially important resource for turning European Union into ‘the most competitive economy of knowledge in the world’.
The United States now is an absolute world leader in practical constructing of Grid nets. In 2004 year George Bush officially announced the beginning of the work of President strategic Grid program (Strategic Grid Computing Initiative), the main aim of which is ‘creating the single national space of high-productive calculations’. For now in the USA there already function with success four national Grid nets which are under the attentive care of key offices of state: computer net of National fund of scientific research, information net of NASA support, global information net of Department of Defense and net of supercomputer initiative of Department of Energy. Under the direction of University of Pennsylvania of the USA on the basis of Grid technologies there was created National Digital Mammography Center with general data volume 5.6 petabytes, which gives health professionals possibility of fast access to records of millions of patients.

Great bits into making of Grid technologies do also private American companies. For example, Google Corporation, famous all over the world due to its information search system, announced the project of constructing the global Grid system, which turns computing into a community service. Within this project all computer devices (PC, cell phone, TV set) become simply terminals that will be included into the server Grid Google with the service of information delivery on any device in every corner of the world.
Beginning from 2000 year, there is being executed work on mastering of Grid technologies in China as well. For the long time information about the stage on which realization of ChinaGrid project is, was in fact, classified. Information bomb exploded in the middle of July 2006, when Chinese mass media openly announced finishing of the work on China Educational Grid Project (CEGP). CEGP united computer nets of several tens of the largest universities of the country and gave millions of Chinese students direct access to databases, online educational courses and service additions on the very different directions and subjects. In January 2006, in Athens there was officially announced the beginning of executing common Grid project of European Union and China (EUChinaGRID) financed by European Commission. Its main goal is uniting European and Chinese Grid infrastructures to increase the effectiveness of common of different scientific adjuncts that work in Grid environment. Planned strategic alliance of European Union and China can be quite considered as one of the first efforts to create the strong ‘Grid counterbalance’ to pretensions of the USA on world leadership in this large-scale technological race. Soon this alliance can be completed by India, which also announced start of the own National Grid project GARUDA, which expects uniting 17 the largest scientific-research centers of the country into Grid nets.

The main resource elements of Grid nets are supercomputers and supercomputer centers, and the most important infrastructure component is high-speed data communication   network. In the Northern hemisphere it is being finished the construction of world computer net GLORIAD which will unite computing resources of different scientific-research organizations of the USA, Canada, Europe, Russia, China and South Korea (again, mostly physical centers). Now wireless Internet (Wi-Fi) is being introduced as, in its kind, electronic ‘communal service’ in some cities (for example, Philadelphia) or even in separate countries (Singapore).

Supercomputers, which aren’t united into territorially distributed system, have minimally three considerable disadvantages. First, it is very expensive techniques, which quickly becomes morally old (supercomputers from the first hundred of Top-500 rating already in two-three years, as a rule, are found in the very tail of this list or disappear from it at all). Second, computational powers of supercomputers don’t practically yield to serious modernization, which often doesn’t permit to use them for solution of the tasks with new level of complexity. And, at last, the third ‘big minus’ is low efficiency coefficient of supercomputer use because of processor load irregularity. In ideal one can get rid of these disadvantages by uniting supercomputers into the Grid net. But for effective exploitation of Grid nets it is necessary first to achieve agreement in standardization sphere (definition of service standards, interfaces, databases).
The authors of idea of Grid computing Foster and Kesselman also conducted pioneer development of the first standard of constructing Grid nets, freely spread intermediate program layer Globus Toolkit, which became international standard de facto. In Europe on the basis of Globus Toolkit CERN there is executed modification of intermediate layer gLite, which became the basis of mentioned European Grid net EGEE for scientific research. The main task that is sold in Grid is providing access to resources, and, as resources are distributed, functioning of the net is provided by special services (composition of resource catalogues and their state tracking, authorization of clients and their access to resources; cooperation and coordination at resource use, safety providing, etc.). Access to resources is performed on the basis of creating Virtual organization (VO) which consists of enterprises and separate specialists which use common resources together.

Grid in Russia and Ukraine

As it is known, initiator and coordinator of works at introducing in Europe Grid infrastructure EGEE (Enabling Grids for E-science in Europe), able to give possibility of common data and computing resource use to European scientific society, is CERN (European Nuclear Research Center). CERN actively involves into work collaborators of physical institutes of Russia, Ukraine and the other countries for processing and analysis of experimental data from its large hadron collider (LHC), the largest accelerator in the field of particle physics. EGEE infrastructure is built on research net of European Union GEANT and provides the possibility of common work with  the other Grid systems all over the world, including the USA and Asia, which favors World Grid infrastructure establishing.
Beginning from 2007, European Grid infrastructure (EGI) must function on the permanent basis as cooperative net of National Grid infrastructures (NGI). ENRC leaves for itself general coordination and responsibility for intermediate middleware layer modernization and general security system. 

There is didactic comparing of activity and results at National Grid infrastructure development, achieved by academic institutes of Russia and Ukraine which were involved by CERN to execute their professional Grid projects. In Russia in 2003 there was created RDIG consortium (Russian Data Intensive GRID, RDIG), called to provide effective spreading of EGEE infrastructure in Russia with simultaneous involving of other organizations from different fields of science, education and industry. It contains eleven pioneering physical institutes, four universities and Geophysical Center RAS. Consortium takes active part in EGEE – RDIG activity, having crated the following groups of Grid infrastructure controlling: operations in European Grid (SA1); support and management (SA2); providing with net resources (SA3); integration, testing and certification (SA4); general management (NA1); spreading, experience exchange and representation (NA2); education of users and their attestation (NA3); test and support of additions (NA4); international collaboration politics (NA5). For information support of RDIG consortium, there is developed portal www.egee-rdig.ru, and for involving new users from different spheres of activity, there is created Grid Certification Center for giving users Open Public Keys (СА), which are introduced with the aim of information security providing. Now in Russia there function European virtual organizations (VO), for instance, on high energy physics (LHC – ATLAS, ALICE, CMS etc.), biomedicine, along with the Russian ones VO: on geophysics  , fusion , chemical physics  and the others...
Successful RDIG favored introducing Grid technologies out of the frames of EGEE project: recently appeared Russian Grid infrastructure RISA for scientific additions, and NumGRID (Numerical Grid) project with new powerful players. This is RAS Interdepartmental Supercomputer Center (RAS ISC), in which there are situated the most powerful clusters of CIS countries, and Institute of calculating mathematics and mathematical geophysics of Siberian Department of RAS. Middleware created by them permits to use computers connected via nets with different capacities, which provides convenience for users and inconsiderable changes of additions themselves at their installation.

In Ukraine (according to data of  site http://uag.bitp.kiev.ua/)  at  resembling start conditions everything looks far more modest : two physical institutes (KhFTI and ITF)  are involved into CERN Grid projects on high energy physics; ITF and KNU clusters are connected to AliEn-Grid net for serving ALICE experiment for LHC accelerators, KhFTI  centre is connected to CERN net via Russian RDIG (though all these institutes aren’t official partners of EGEE-II project, in distinction from Russian physical institutes), several collaborators from aforementioned organizations formalized access certificates to EGEE resources via the Russian RDIG due to then absence of National Certification Center;  the other Grid projects are also carried out  in Institute of Cosmic Research of NASU-NCAU (on cosmic photo processing) and the Main Astronomic Observatory of NASU (on astrophysics tasks); Computing resources of clusters of Institute of Cybernetics of NASU, ITF and Taras Schevtchenko’s  KNU are united with the help of NorduGrid software. By initiative of ITF in April, 2006, i NASU there started corporative program ‘Grid technologies implementation and cluster building in National Academy of Science of Ukraine’, quite different from Russian RDIG, because it doesn’t wholly take into account nationwide needs and scientific interests of scientists and organizations that do not work in NASU system. It is explained by fact that in NASU there was an opinion that before pretending to some national tasks, it is necessary to get Grid t practice by creating a working system, and only then to propose developments in hands for implementation in the other organizations. But it isn’t understandable why these other organizations had to wait while NASU would be able to do the same for them what they were able to do themselves?

That why , the group of 10 organizations which represent two academic institutes of NASU, six national technical universities of Ukrainian Ministry of Education ane Science (MESU) and two industrial enterprises, following the Program of Cabinet of Ministers of Ukraine ‘‘State program ‘Development of information and telecommunication technologies in education and science for the period of 2006 -2010 years"’ (ІCТ program fr № 1153 from 7/12/2005) starts to develop the Ukrainian National Grid Infrastructure (UGrid), being supported by the Government This entirely all-sufficient collective, who possesses all the necessary things for project realization, nevertheless felt it necessary to announce its decision openly in press and to invite for collaboration all the people willing to take part in creating National Grid infrastructure to obtain a possibility to better take into account peculiarities of Grid projects, in which performing there are interested different user groups, together. This invitation is active now, and in section ‘Research tasks’ of this page there are given subjects of such tasks.

But now country faces pressing task: having begun late comparing with the other European countries, to create National Grid Infrastructure, which will have to pass international audit and according to new demands to satisfy following criteria: to have state support, for instance, by the means of  Grid infrastructure project including into State program with guaranteed financing; to represent interests of all the society layers (scientists, collaborators of higher educational establishments, manufacturers, businessmen, etc.); to have branchy  structure with coordinating, regional and resource centers  which provide functioning basic Grid services, monitoring and responding to emergency situations, асcounting of resources and work executed, controlling and support of virtual organizations (VO), Grid software certification; to be based on following international standards and rules; to support infrastructure security, to have a right to generate user certificates CA with the knowledge of  EUGridPMA (European Grid Identity Verification system); to be connected to GEANT, European research-educational computer net; to have steering bodies of Grid infrastructure in the form of Council on National Grid infrastructure development, coordinating thematic groups NA1 -  NA4,  SA1 – SA3 as with RDIG.
Of course, creating National Grid infrastructure is not a very grateful business for state organizations. It is the same that to start to perform the task of accounting of transport infrastructure in Kiev to European standards in conditions of its present state and paltry financing. It is better far more pleasant to use this infrastructure hereafter to make city journeys without boring traffic jams. There can be cited large list of Grid projects, realization of which NTUU ‘KPI’ cannot begin immediately without nationwide Grid infrastructure which pushed NTUU ‘KPI’ to take part in UGrid creating.
Certainly, national Grid infrastructure must be built by common efforts of NASU, MESU and industry organizations... But it’s necessary to understand and to agree that now in Ukraine there are several players on this field, (NASU, MESU, industry), which naturally complete each other. On NASU side: authoritative Informatics department, considerable computing resources, practical experience of uniting high-productive computers into the net, experience of carrying out research by separate scientists  in real conditions of European Grid projects, longstanding experience of alrorithmization of different scientific tasks and developing corresponding  additions for them, beginning from physics and biology tasks and ending with literary studies.
On side of MESU: higher educational establishments and industry: longstanding experience of building and research of distributed computing systems, in particular, systems of collective net design of  high technology products on  Grid technologies; experience of participation in European BalticGrid project as associative member; comparing software tests of  Globus, NorduGrid and gLite intermediate layer; official agreement with DANTE about connecting URAN computer net to the European GEANT net; agreement with EUGridPMA about creating СА service in Ukraine; enormous experience in solving  technical-scientific  and engineer tasks, beginning from technical and technological foreseeing of science and techniques development, value and consequences of solutions in these fields, connecting with choice of priorities and appointment of financing volume on state level or on the level of enterprises, and ending with modeling of logical schemes with one-electronic nanotransistors; rapidly increasing computing resources and the most modern experience of cluster building (USTAR firm); practically inexhaustible  reserve of talented creative young people.
At goodwill of both sides it is possible, as in case of Russian RDIG, to begin from forming combined consortium UNGI (Ukrainian National Grid Initiative), which could include representatives of existing UGrid and UAGI projects and which could coordinate efforts and work stages of creating single National Grid infrastructure for science and education, taking into account problems and peculiarities of different user groups of the country
Now such NANU and MESU co-operation is installed and the joint proposal was submitted to the European Program EGEE- EGI for 2008-2010 years.

Ukrainian (UA)Russian (CIS)English (United Kingdom)