Networks
@tags:: #litâ/đbook/highlights
@links:: networks, network theory,
@ref:: Networks
@author:: Mark Newman
=this.file.name
Reference
=this.ref
Notes
The book is divided into four parts. Following a short introductory chapter, Part I describes the basic types of networks studied by present-day science and the empirical techniques used to determine their structure. Part II introduces the fundamental tools used in the study of networks, including the mathematical methods used to represent network structure, measures and statistics for quantifying network structure, and computer algorithms for calculating those measures and statistics. Part III describes mathematical models of network structure that can help us predict the behavior of networked systems and understand their formation and growth. And Part IV describes applications of network theory, including models of network resilience, epidemics taking place on networks, and network search processes.
- Location 4370
-
network is, in its simplest form, a collection of points joined together in pairs by lines.
- Location 6117
- blue,
Introduction
In the nomenclature of the field a point is referred to as a node or vertex1 and a line is referred to as an edge.
- Location 6117
- blue,
What can we learn from networks?
Networks capture the pattern of interactions between the parts of a system.
- Location 8739
-
A network is a simplified representation that reduces a system to an abstract structure or topology, capturing only the basics of connection patterns and little else. The systems studied can, and often do, have many other interesting features not represented by the networkâthe detailed behaviors of individual nodes, such as computers or people, for instance, or the precise nature of the interactions between them.
- Location 8740
-
Properties of networks
On the other hand, direct visualization of networks is only really useful for networks up to a few hundreds or thousands of nodes, and for networks that are relatively sparse, meaning that the number of edges is quite small. If there are too many nodes or edges then pictures of the network will be too complicated for the eye to comprehend and their usefulness becomes limited.
- Location 9178
-
An example of a useful (and widely used) class of network metrics are the centrality measures. Centrality quantifies how important nodes are in a network,
- Location 9613
- blue,
The degree of a node in a network is the number of edges attached to it. In a social network of friendships, for instance, such as 2 4 2 1 3 The number beside each node in this small network indicates the nodeâs degree. the network of Fig. 1.2, the degree of an individual is the number of friends he or she has within the network.
- Location 9613
- blue,
In many cases the nodes with the highest degrees in a network, those with the most connections, also play major roles in the functioning of the system, and hence degree can be a useful guide for focusing our attention on the systemâs most important elements.
- Location 9614
-
In undirected networks degree is just a single number, but in directed networks nodes have two different degrees, in-degree and out-degree, corresponding to the number of edges pointing inward and outward respectively.
- Location 9614
- blue,
A further observation concerning degree is that many networks are found to contain a small but significant number of âhubsâânodes of unusually high degree. Social networks, for instance, often contain a few individuals with Hubs are discussed further an unusually large number of acquaintances. The Web has a small fraction in Section 10.3. of websites with a very large number of links.
- Location 9615
- network hubs, 1evernote, degree centrality,
Although first studied in the context of friendship networks, this small-world effect appears to be widespread, occurring in essentially all types of networks.
- Location 10050
-
A third example of a network phenomenon of practical importance is the Community structure in occurrence of clusters or communities in networks. We are most of us familiar networks is discussed in detail in Chapter 14. with the idea that social networks break up into subcommunities.
- Location 10052
-
Technological networks
The Internet
Figure 2.1: A schematic depiction of the structure of the Internet. The nodes and edges of the Internet fall into a number of different classes: the backbone of high-bandwidth long-distance connections; the ISPs, who connect to the backbone and who are divided roughly into regional (larger) and local (smaller) ISPs; and the end usersâhome users, companies, and so forthâwho connect to the ISPs.
- Location 12671
-
If there were a central Internet government with a complete map of the system, then the job of determining the network structure would be easyâone would just look at the map. But there is no such organization and no such map. Instead the networkâs structure must be determined by experimental measurements. There are two primary methods for doing this. The first uses âtracerouteâ; the second uses BGP.
- Location 13109
-
Measuring Internet structure using traceroute
We can use traceroute (or a similar tool) to probe the network structure of the Internet. The idea is to assemble a large data set of traceroute paths between many different pairs of points on the Internet. With luck, most of the edges in the network (though usually not all of them) will appear in at least one of these paths, and the combination of all of them together should give a reasonably complete picture of the network.
- Location 13545
-
A subnet is a group of IP addresses defined as follows. IP addresses consist of four numbers, each one in the range from 0 to 255 (eight bits in binary) and typically written in a string separated by periods or dots.4 For example, the IP address of the main web server at the authorâs home institution, the University of Michigan, is 141.211.243.44. IP addresses are allocated to organizations in blocks. The University of Michigan, for instance, owns (among others) all the addresses of the form 141.211.243.xxx, where âxxxâ can be any number between 0 and 255. Such a block, where the first three numbers in the address are fixed and the last can be anything, is called a class C subnet. There are also class B subnets, which have the form 141.211.xxx.yyy, and class A subnets, which have the form 141.xxx.yyy.zzz.
- Location 14420
- blue,
A domain is a group of computers and routers under, usually, the control of a single organization and identified by a single domain name, normally the last two or three parts of a computerâs address when the address is written in human-readable text form
- Location 14856
- blue,
Measuring Internet structure using routing tables
we can construct a picture of the Internet at the autonomous system level by examining them. The process is similar to that for the traceroute method described in the previous section and depicted in Fig. 2.2. We must first obtain a set of router tables, which is normally done simply by asking router operators for access to their tables. Each router table contains a large number of paths starting from a single source (the router), and the union of the paths from many routers gives a good, though not complete, network snapshot in which the nodes are autonomous systems and the edges are the connections between autonomous systems. As with traceroute, it is important that the routers used be widely distributed across the network to avoid too much duplication of results, and the number of routers should be as large as possible to make the sampling of network edges as complete as possible.
- Location 15294
-
tools now exist for determining, at least approximately, the geographic location of a given IP address, domain, or autonomous system. Examples include NetAcuity, IP2Location, MaxMind, and many others. Geographic locations are determined primarily by looking them up in one of several registries that record the official addresses of the registered owners of IP addresses, domains, or autonomous systems. These addresses need not in all cases match the actual location of the corresponding computer hardware. For instance, the domain ibm.com is registered in New York City, but IBMâs principal operations are in California. Nonetheless, an approximate picture of the geographic distribution of the Internet can be derived by these methods, and there has been some interest in the results [477].
- Location 16168
-
- [note::What kinds of interesting use cases are there for combining network topologies and geographic maps? Supply chain, power generation, & airports come to mind.]
The telephone network
discrete packets the way Internet data are (though there are exceptionsâsee below). The telephone network is a circuit-switched network, which means that the telephone company has a number of lines or circuits available to carry telephone calls between different points and it assigns them to individual callers when those ... Figure 2.4: A sketch of the three-tiered structure of a traditional telephone network. Individual subscriber telephones are connected to local exchanges, which are connected in turn to long-distance offices. The long-distance offices are connected among themselves by trunk lines, and there may be some connections between local exchanges as well. callers place phone calls.
- Location 16605
-
Power grids
while there is a temptation to apply network models of the kind described in this book to try to explain the behavior of power grids, it is wise to be cautious. Power grids are complicated systems. The flow of power is governed not only by geometry and simple physical laws, but also by detailed control of the phases and voltages across transmission lines, monitored and adjusted on rapid timescales by sophisticated computer systems and on slower timescales by human operators. There is evidence to suggest that network topology has only a relatively weak effect on power failures and other powergrid phenomena, and that good prediction and modeling of power systems requires more detailed information than can be gleaned from a network representation alone [234, 378].
- Location 17915
-
- [note::Often, a multi-faceted approach is needed to fully understand a network (visualizing the network topology simply doesn't cut it)]
Transportation networks
Sen et al. argue, plausibly, that in the context of rail travel what matters to most people is whether there is a direct train to their destination or, if there is not, how many trains they will have to take to get there. People do not care so much about how many stops there are along the way, so long as they donât have to change trains. Thus, Sen et al. argue, a useful network representation in the case of rail travel is one in which the nodes represent locations and two nodes are connected by an edge if a single train runs between them. Then the distance between two nodes in the networkâthe number of edges you need to traverse to get from A to Bâis equal to the number of trains you would have to take. A better representation still (although Sen et al. did not consider it) would be a âbipartite network,â a network containing two types of node, one representing the locations and the other representing train routes. Edges in the network would then join locations to the routes that run through them.
- Location 18351
-
- [note::This underscores the importance of thinking carefully about how you might represent a network. It's possible that the network topology you come up with doesn't represent what people actually care about or what factors actually influence the behavior of the system or the agents within it.]
Delivery and distribution networks
Figure 2.6: Drainage basin of the Loess Plateau. The network of rivers and streams on the Loess Plateau in the Shanxi province of China. The tree-like structure of the network is clearly visibleâthere are no loops in the network, so water at any point in the network drains offthe plateau via a single path.
- Location 19224
-
Networks of information
The World Wide Web
At that time there were several similar information systems competing for dominance of the rapidly growing Internet, but the Web won the battle, largely because its inventors decided to give away for free the software technologies on which it was basedâthe Hypertext Markup Language (HTML) used to specify the appearance of pages and the Hypertext Transport Protocol (HTTP) used to transmit pages over the Internet.
- Location 20100
-
- [note::The power of open standards strikes again!]
In its simplest Breadth-first search is dis- form, the crawler performs a so-called breadth-first search on the web network, cussed at length in Section 8.5. as shown schematically in Fig. 3.2. One starts from any initial web page, downloads the text of that page over the Internet, and finds all the links in the text. Functionally, a link consists of an identifying âtagââa short piece of text marking the link as a linkâand a Uniform Resource Locator, or URL, a standardized computer address that says how and where the linked web page can be found. By scanning for the tags and then copying the adjacent URLs a web crawler can rapidly extract URLs for all the links on a web page, storing them in memory or on a disk drive. When it is done with the current page, it takes one of the URLs from its store, uses it to locate a new page on the Web, and downloads the text of that page, and so the process repeats.
- Location 20536
-
- [note::Are there any existing crawlers that build a network diagram based on connections between pages? This is basically what I'm looking to do with my Zotero library.]
Since the Web is a directed network, not all pages are reachable from ... a given starting point. In particular, it is clear that pages that have no incoming hyperlinksâpages that no one links to at allâcan never be found by a crawler that follows links. Taking this idea one step further, it is also the case that a page will never be found if it is only linked to by pages that themselves have no incoming links.
- Location 20976
-
- [note::Perhaps Google should be called a "webpage finder" instead of a "search engine"]
In the case of the World Wide Web the giant outcomponent is estimated to occupy only about a half of all web pages and the other half of the Web is unreachable [84].2
- Location 21410
-
- [note::Wow. I would not have guessed 50%.]
There are a number of capable web crawler programs available for free on the Internet, including wget, Nutch, GRUB, and Sphinx. While most of us donât have the time or the facilities to crawl billions of web pages, these programs can be useful for crawling small sets of pages or single websites, and much useful insight and information can be acquired by doing so.
- Location 21411
- 1action,
- [note::Haven't heard of these before. Will have to check them out.]
Citation networks
the Science Citation Index (along with its sister publications, the Social Science Citation Index and the Arts and Humanities Citation Index) is now one of the primary and most widely used sources of citation data. ... Another database, Scopus, provides a competing but largely similar service.
- Location 21848
-
Perhaps the best known of these is Google Scholar, the academic literature arm of the Google search engine.
- Location 21849
-
Other automated citation indexing projects include Citebase, which indexes physics papers, and CiteseerX, which indexes computer science.
- Location 22283
-
There is, however, at least one important difference between a citation netAcyclic networks are dis- work and the Web: a citation network is acyclic, while the Web is not. An acyclic cussed further in Section 6.4.1. network is one in which there are no closed loops of directed edges. On the World Wide Web, it is entirely possible to follow a succession of hyperlinks and end up back at the page you started at. On a citation network, by contrast, this is essentially impossible. The reason is that in order to cite a paper, that paper must already have been written. One cannot cite a paper that does not yet exist. Thus all the directed edges in a citation network point backward in time, from See Fig. 6.3 for an illustra- newer papers to older ones.
- Location 22285
- blue,
Citation networks have some interesting statistics. For instance, one study ... found that about 47% of all papers have never been cited at all [404]. Of the remainder, 9% have one citation, 6% have two, and it goes down quickly after that. Only 21% of all papers have 10 or more citations, and just 1% have 100 or more. These figures are a consequence of the power-law degree distribution of the networkâsee Section 10.4.
- Location 22286
-
- [note::I'd be super interesting to understand how social network topology plays into this phenomenon. I suspect this is highly intractable, however.]
Citation networks of the type described so far are the simplest but not the only possible network representation of citation patterns. An alternative representation is the cocitation network. Two papers are said to be cocited if they are both cited by the same third paper. Cocitation is often taken as an indicator that papers deal with related topics and there is good evidence that this is a reasonable assumption in many cases.
- Location 22720
-
One can also define a weighted cocitation network in which the edges have varying strengths: the strength of an edge between two papers is equal to the number of other papers that cite both.
- Location 22721
-
Another related concept, although one that is less often used, is bibliographic coupling. Two papers are said to be bibliographically coupled if they cite the same other papers (rather than being cited by the same papers). Bibliographic coupling, like cocitation, can be taken as an indicator that papers deal with related material and one can define a strength or weight of coupling by the number of common citations between two papers.
- Location 22721
-
Patent and legal citations
Citations may highlight dependencies between technologies, such as one invention relying for its operation on another, but more often patent citations are âdefensive,â meaning that the inventor cites the patent for a related previous technology and then presents an argument for why the new technology is sufficiently different from the old one to merit its own patent.
- Location 23157
-
The structure of patent networks reflects the organization of human technology in much the same way that the structure of academic citation networks reflects the organization of research knowledge.
- Location 23159
-
There are a number of interesting technological and legal questions, for instance concerning originality of patented inventions, emerging technologies, and antitrust policy, that can be addressed by examining patent citation networks [106, 161, 216, 247].
- Location 23594
-
- [note::I suspect that using patent networks to predict the trajectory of emerging technology is already a well-trodden field, but It might be good to verify this.]
In countries where law cases can be decided by judges rather than juries, such as civil cases or appeals in Europe or the US, a judge will frequently issue an âopinionâ after deciding a case, a narrative essay explaining his or her reasoning and conclusions. It is common practice in writing such an opinion to cite previous opinions issued in other cases in order to establish precedent, or occasionally to argue against it. Thus, like academic papers and patents, legal opinions form a citation network, with opinions being the nodes and citations being the directed edges, and again the network is approximately acyclic.
- Location 23594
-
- [note::How could legal citation networks be leveraged in animal law contexts?]
in recent years these indexes have made the jump to electronic form and are now available online. In the United States, for instance, two commercial services, LexisNexis and Westlaw,7 provide detailed data on legal opinions and their citations. In the past few years a number of studies of the structure of legal citation networks have been published using data derived from these services [186, 187, 295, 314].
- Location 23595
-
In principle it would be possible also to construct networks of cocitation or bibliographic coupling between either patents or legal opinions, but we are not aware of any studies yet published of such networks.
- Location 23595
-
- [note::What are the main advantages of these approaches in the context of legal networks?]
Other information networks
Peer-to-peer networks
Peer-to-peer file-sharing networks (sometimes abbreviated P2P) are a widely used form of computer network that combines aspects of information networks and technological networks. A peer-to-peer network is a network in which the nodes are computers containing information in the form, usually, of discrete files, and the edges between them are virtual links established for the purpose of sharing the contents of those files. The links exist only in softwareâthey indicate only the intention of one computer to communicate with another should the need arise.
- Location 24030
-
Recommender networks
Recommender networks represent peopleâs preferences for things, such as for certain products sold by a retailer. Online merchants, for instance, may keep records of which customers bought which products and sometimes ask them whether they liked the products. Many large supermarket chains record the purchases made by their regular customers (usually identified by a small card with a barcode on it that is scanned when purchases are made) and so can work We encountered bipartite out which products each customer buys frequently.
- Location 24904
-
Keyword indexes
Another type of information network, also bipartite in form, is the keyword index. An example is the index at the end of this book, which consists of a list of words or phrases, each accompanied by the numbers of the pages on which related information can be found. An index of this kind can be represented as a bipartite network, with two types of nodes representing words and pages, and an edge connecting each word to the pages on which it appears. In addition to their use in books, keyword indexes are routinely constructed as guides to other information collections, including sets of academic papers and the World Wide Web.
- Location 25342
-
- [note::Huh, never would've thought about this as a type of network! Interesting way to think about it.]
In the context of document retrieval, the classic method for determining document similarity and performing generalized searches of this type is latent semantic analysis, which is based on the application of the matrix technique known as singular value decomposition to the bipartite network of keywords and documents [288]. A number of other competing methods have also been developed in recent years, using techniques such as non-negative matrix factorization [291, 292], latent Dirichlet allocation [63], and other probabilistic approaches [236].
- Location 25779
- latent dirichlet allocation, latent semantic analysis, negative matrix factorization,
Social networks
a social network is any network in which the nodes represent people and the edges represent some form of connection between them, such as friendship.
- Location 26215
- blue,
The empirical study of social networks
One of the most important things to appreciate about social networks is that there are many different possible definitions of an edge in such a network and the particular definition one uses will depend on what questions one is interested in answering. Edges might represent friendship between individuals, but they could also represent professional relationships, exchange of goods
- Location 26654
- c1,
or money, communication patterns, romantic or sexual relationships, or many other types of connection.
- Location 27089
-
Direct questioning of experimental subjects is probably the most common method of determining the structure of social networks. We discuss it in detail in Section 4.2.
- Location 27089
-
Such is the power of social network analysis that its techniques have, since the time of Moreno and Davis et al., been applied to an extraordinary variety of different communities, issues, and problems [79]: friendship and acquaintance patterns in local communities and in the population at large [54,55,261,333,447, 452] and among university students [446,479] and schoolchildren [169,338,400]; ... contacts between business people and other professionals [117, 197]; boards of directors of companies [130, 131, 318]; collaborations of scientists [218, 219, 349], movie actors [20, 466], and musicians [206]; sexual contact networks [271, 305, 392, 411, 417] and dating patterns [52, 238]; covert and criminal networks such as networks of drug users [421] or terrorists [282]; historical networks [51, 377]; online communities such as Usenet [313, 431, 449] and Facebook [278, 298, 446, 452]; and social networks of animals [180, 315, 418, 419].
- Location 27092
- social network analysis, 1resource/paper,
Figure 4.2: The affiliation network of the âSouthernWomen Study.â This network (like all affiliation networks) has two types of node, the open circles at the bottom representing the 18 women who were the subjects of the study and the shaded circles at the top representing the social events they attended. The edges connect each woman to the events she attended. Data courtesy of L. Freeman and originally from Davis et al. [129].
- Location 27526
-
- [note::Reminds me of the molecule/drug development study I witnessed at 2024 Penn Grad Talks. Or maybe it was the one about the Russian drug market?]
Interviews and questionnaires
The most common general method for gathering data on social networks is simply to ask people questions. If you are interested in friendship networks, you ask people who their friends are. If you are interested in business relationships you ask people who they do business with, and so forth. The asking may take the form of direct interviews with participants or the completion of questionnaires, either on paper or electronically.
- Location 27963
-
- [note::I believe this is what was done by the author of the network stewardship book I read]
By using a questionnaire, the designers of the study can guarantee that questions are asked, to a good approximation, in a consistent order and with consistent wording. By employing an interviewer to do the asking the study gains flexibility and reliabilityâinterviewees often take studies more seriously when answering questions put to them by a human beingâand interviewers may be given some latitude to probe interviewees when they are unclear, unresponsive, or confused. These are important considerations, since misunderstanding and inconsistent responses to survey questions are substantial sources of error [320].
- Location 27963
-
- [note::This speaks to the fundamental challenge of social network analysis and network stewardship: accurately representing the structure of the network]
A good introduction to social survey design and implementation is given by Rea and Parker [403].
- Location 27964
-
To measure social networks, surveys typically employ a name generator, a question or series of questions that invite respondents to name others with whom they have contact of a specific kind. For example, in their classic study of friendship networks among schoolchildren, Rapoport and Horvath [400] asked children to complete a questionnaire that included items worded as follows: My best friend at Junior High School is: My second-best friend at Junior High School is: My third-best friend at Junior High School is: . . . My eighth-best friend at Junior High School is:
- Location 27964
-
This is a common issue: it is highly likely that any group of individuals surveyed will have at least some ties outside the group and one must decide what to do with these ties. Sometimes they are recorded. Sometimes, as here, they are not. Such details can be important since statistics derived from survey results will often depend on the decisions made.
- Location 28400
-
There are a number of points to note about the data produced by name generators. First, the network ties, friendships in the case above, are determined by one respondent nominating another. This is a fundamentally asymmetric process. Individual A identifies individual B as their friend. In many cases B will also identify A as their friend, but there is no guarantee that this will We encountered directed happen and it is not uncommon for nomination to go in only one direction. We networks previously in Chapter 1, in our discussion of the World Wide Web, and they are discussed in more detail in Section 6.4. normally think of friendship as a two-way relationship, but surveys suggest that this not always the case. As a result, data derived from name generators are often best represented as directed networks, networks in which edges run in a particular direction from one node to another. If two individuals nominate each other then we have two directed edges, one pointing in either direction.
- Location 28400
-
- [note::Networks produced via name generator surveys should often be represented as unidirectional, not bidirectional (since you can't assume that a given pair of Individuals will both consider each other friends 100% of the time)]
Each node in the network then has two degrees, an out-degreeâthe number of friends identified by the corresponding individualâand an in-degreeâthe number of others who identified the individual as a friend.
- Location 28401
- blue,
It is common, as in the example above, for the experimenter to place a limit on the number of names a respondent can give. In the study of Rapoport and Horvath this limit was eight. Studies that impose such a limit are called fixed choice studies. The alternative is a free choice study, which imposes no limit.
- Location 28402
- blue,
Not all surveys employing name generators produce directed networks. Sometimes we are interested in ties that are intrinsically symmetric between the two parties involved, in which case the edges in the network are properly represented as undirected. An example is networks of sexual contact, which are widely studied to help us understand the spread of sexually transmitted diseases [271,305,392,417]. In such networks a tie between individuals A and B means that A and B had sex.
- Location 28838
-
Surveys can and often do ask respondents not just to name those with whom they have ties but to describe the nature of those ties as well. For instance, questions may ask respondents to name people they both like and dislike, or to name those with whom they have certain types of contact, such as socializing together, working together, or asking for advice. For example, in a study of the social network of a group of medical doctors, Coleman et al. [117] asked respondents the following questions: Who among your colleagues do you turn to most often for advice? With whom do you most often discuss your cases in the course of an ordinary week? Who are the friends among your colleagues whom you see most often socially?
- Location 28839
-
A survey such as this, which asks about several types of interactions, ... effectively generates data on several different networks at onceâthe network of advice, the discussion network, and so forthâbut all built upon the same set of nodes. Networks such as this are sometimes called âmultilayerâ or âmultiplexâ Multilayer networks are networks.
- Location 28840
- blue,
Some of the most interesting results of social network The common tendency of studies concern the extent to which peopleâs choice of whom they associate people to associate with others who are similar to themselves in some way is called âhomophilyâ or âassortative mixing,â and we discuss it in detail in Sections 7.7 and 10.7. with reflects their own background and that of their associates. For instance, you might choose to socialize primarily with others of a similar age to yourself, but turn for advice to those who are older than you.
- Location 29274
-
- [note::Social network studies demonstrate the effect of homophily and other aspects of human behavior]
The main disadvantages of network studies based on direct questioning of participants are that they are first laborious and second inaccurate. The administering of interviews or questionnaires and the collation of the responses is a demanding job that has been only somewhat eased by the use of computers and online survey tools. As a result, most studies have been limited to a few tens or at most hundreds of respondentsâthe 34-node social network of Fig. 1.2 is a typical example. It is a rare study that contains more than a thousand actors, and studies such as the National Longitudinal Study of Adolescent Health,3 which compiled responses from over 90 000 participants, are very unusual and extraordinarily costly. Only a substantial public interest such as, in that case, the control of disease, can justify the expense of performing them.
- Location 29274
- network structure, network studies,
- [note::In the case of COVID, NOVID was an interesting use case - they used bluetooth to detect when one person was in close contact with another, allowing a real-time contract-tracing network to be built]
Data based on direct questioning may also be affected by biases of various kinds. Answers given by respondents are always to some extent subjective. If you ask people who their friends are, for instance, different people will interpret âfriendâ in different ways and thus give different kinds of answers, despite the best efforts of investigators to pose questions and record the answers in a uniform fashion. This problem is not unique to network studies. Virtually all social surveys suffer from such problems and a large body of expertise Experimental error in net- concerning techniques for dealing with them has been developed [320, 403].
- Location 29275
- pink,
Ego-centered networks
At the other end of the spectrum from sociometric studies lie studies of personal networks or ego-centered networks. 4 An ego-centered network is the network surrounding one particular individual, meaning that individual plus his or her immediate contacts. The individual in question is referred to as the ego and the contacts as alters.
- Location 29711
- blue,
An ego-centered network consisting of an ego and five alters.
- Location 29711
-
Typically, one constructs not just a single ego-centered network but several, centered on different egos drawn from the target population. In a telephone survey, for instance, one might call random telephone numbers in the target area and survey those who answer, asking them to identify others with whom they have a certain type of contact. Participants might also be asked to describe some characteristics both of themselves and of their alters, and perhaps to answer some other simple questions, such as which alters also have contact with one another.
- Location 29711
-
- [note::Is this the approach that was taken in that one study about identifying rumor spreaders in India to help promote social programs?]
Sometimes, however, we are primarily interested in local network properties, and ego-centered network studies can give us good data about these. For example, if we wish to know about the degrees of nodes in a networkâthe numbers of ties people haveâ then a study in which a random sample of people are each asked to list their contacts may give us everything we need. (Studies probing node degrees are discussed more below.) If we also gather data on contacts between alters, we can estimate clustering coefficients (see Section 7.3). If we have data on characteristics of egos and alters we can measure assortative mixing (Sections 7.7 and 10.7).
- Location 29712
-
Direct observation
One arena in which direct observation is essentially the only viable experimental technique is studies of the social networks of animals, since clearly animals cannot be surveyed using interviews or questionnaires. Not all animal species form interesting social networks, but informative studies have been performed of, among others, monkeys [180,418,419], kangaroos [214], and dolphins [121, 315].
- Location 31021
-
Data from archival or third-party records
Time-varying time-varying networks are sometimes called longitudinal studies, and you may occasionally encounter this term in the literature. networks have been the focus of increasing research attention in recent years. We discuss them further in Section 6.7.
- Location 31895
-
Studies of weblogs and journals have been performed, for example, by Adamic and Glance [4] and MacKinnon and Warren [317].
- Location 31897
-
Affiliation networks
An affiliation network is a network in which actors are connected via their membership in groups of some kind.
- Location 31898
- blue,
the most complete representation of an affiliation network is a network with two types of nodes representing the actors and the groups, with edges connecting actors to the groups to which they belongâsee Fig. 4.2 on page 50. In such a representation, We study bipartite networks in more detail in Section 6.6. called a âbipartite networkâ or âtwo-mode network,â there are no edges connecting actors directly to other actors or groups to other groups, only actors to groups.
- Location 32332
-
Also in the business domain, a number of studies have been conducted of the networks formed by the boards of directors of companies [130,131,318], where the actors are company directors and the groups are the boards on which they sit.
- Location 32333
-
The small-world experiment
Snowball sampling, contact tracing, and random walks
Techniques have been developed, however, for sampling these populations by making use of the social networks that connect their members together. The most widely used such technique is snowball sampling [162, 188, 445]. Note that, unlike the other experimental techniques discussed in this chapter, snowball sampling is not intended as a technique for probing the structure of social networks. Rather, it is a technique for studying hidden populations
- Location 34081
- c1, network structure, snowball sampling,
that relies on social networks for its operation. It is important to keep this distinction clear. To judge by the literature, some professional network scientists do not, a mistake that can result in erroneous conclusions and bad science.
- Location 34516
-
- [note::c2]
So investigators probe the population instead by getting some of its members to provide contact details for others. The typical survey starts offrather like a standard ego-centered network study (see Section 4.2.1). You find one initial member of the population of interest and interview them about themselves. Then, upon gaining their confidence, you invite them also to name other members of the target population with whom they are acquainted. Then you go and find those acquaintances and interview them in turn, asking them also to name further contacts, and so forth through a succession of âwavesâ of sampling. Pretty soon the process âsnowballsâ and you have a large sample of your target population to work with.
- Location 34517
-
particular, snowball sampling gives highly biased samples. In the limit of a large number of waves, snowball sampling samples actors non-uniformly with probability proportional to their âeigenvector centralityâ (see Section 7.1.2). In principle, knowing this, one could compensate for the non-uniformity by appropriately weighting the results, but in practice the limit of large number of waves is rarely reached, and in any case the eigenvector centrality cannot be calculated without knowledge of the complete contact network, which by definition we donât have, making correct weighting impossible. In short, snowball sampling gives biased samples of populations and there is little we can do about it. Nonetheless, the technique is sufficiently useful for finding populations that are otherwise hard to pin down that it has been widely used, biases and all, in studies over the past few decades.
- Location 34518
-
A technique closely related to snowball sampling is contact tracing, which is essentially a form of snowball sampling applied to disease incidence. Some diseases, such as tuberculosis and HIV, are considered in many countries to be sufficiently serious that, when someone is discovered to be carrying them, an effort must be made to track down all those who might also have been infected. Thus, when a patient tests positive for HIV, for instance, he or she will be questioned about recent sexual contacts, and possibly about other types of potentially disease-causing contacts, such as needle sharing if the patient is an injection drug user. Then health authorities will make an effort to track down the people so identified and test them for HIV also. The process is repeated with any who test positive, tracing their contacts as well, and so forth, until all leads have been exhausted.
- Location 34954
-
Data from contact tracing studies display biases similar in type and magnitude to those seen in snowball sampling and should be treated with the same caution. Indeed, they may contain extra biases as well, since contacts are rarely pursued when an individual tests negative for the disease in question, so the sample is necessarily dominated by carriers of the disease, who are themselves usually a biased sample of the population at large.
- Location 34955
-
There is another variant of snowball sampling that deals to some extent with the problems of bias in the sample. This is random-walk sampling [270, 445]. In this method one again starts with a single member of the target community and interviews them to determine their contacts. Then, however, instead of tracking down all of those contacts, one chooses one of them at random and interviews only that one at the next step. If the person in question cannot be found or declines to be interviewed, one chooses another contact, and the process is repeated.
- Location 34955
-
The advantage of the random-walk sampling method is that, as shown in Section 6.14.3, the asymptotic sampling probability of nodes in a random walk is simply proportional to node degree (see Eq. (6.44)). Whatâs more, the asymptotic regime in such studies is, unlike snowball sampling, reached quite quickly for relatively small sample sizes.8 Knowing this, and given that we determine degree (i.e., the number of contacts an individual has) as a part of the interview process, we can easily compensate for sampling bias by a suitable weighting of the results and make population estimates of quantities in a way that is, in theory at least, unbiased. In practice, many sources of bias remain, particularly those associated with participant subjectivity, inability to recall contacts, and non-participation of named contacts. Still, random-walk sampling is a significant improvement on standard snowball sampling. Its principal disadvantage is that it is relatively slow. Since the participants are interviewed serially, in a chain, rather than in parallel waves, a strict implementation of the method can take a long time to develop a large sample.
- Location 35390
-
In some cases it is considered unethical to get participants to name their contacts, particularly when the topic of a study is one of dubious legality, and permission to perform such studies may be withheld by the authorities. To circumvent this problem one can make use of respondent-driven sampling [421]. In this technique, participants are ... usually paid to take part, and enrollment is achieved by handing out tickets to interviewees. Rather than asking people to name their contacts, the interviewees are simply invited to give the tickets to their acquaintances, and told that both they and the acquaintances will receive payment if the acquaintance brings the ticket to the investigator and agrees to participate in the study. In this way, no one is ever asked to name names and all participants have actively volunteered their participation.
- Location 35392
-
In the case where a single ticket is given to each participant, the method is roughly equivalent to random-walk sampling and should in theory give a less biased sample than snowball sampling for the same reasons. In practice, a new bias is introduced because the recipient of the ticket is not necessarily chosen at random from an individualâs acquaintances. Also, tickets frequently get lost or their recipients decline to participate, remuneration notwithstanding, so one normally gives out more than one ticket to each participant, which complicates the sampling process and can introduce further biases [413].
- Location 35827
-
Biological networks
- Location 36264
- c1,
Biochemical networks
Metabolic networks
Proteinâprotein interaction networks
Genetic regulatory networks
Other biochemical networks
- Location 43256
-
Networks in the brain
Networks of neurons
- Location 44129
-
- [note::h3]
Networks of functional connectivity in the brain
Ecological networks
Food webs
A speciesâ position in this pecking order is called by ecologists its trophic level. Species at the bottom of the food web, of which there is just one in our exampleâthe phytoplanktonâhave trophic level 1. Those that prey on themâ krill, herbivorous planktonâhave trophic level 2, and so forth all the way up to the species at the top of the web, which have no predators at all.
- Location 48498
- blue,
Other ecological networks
The US National Center for Ecological Analysis and Synthesis has assembled an large collection of mutualistic network data in their Interaction Web Database, which can be found at http://www.nceas.ucsb.edu/interactionweb.
- Location 49809
- 1resource/data,
Mathematics of networks
Graph theory is a large field containing many results and we describe only a small fraction of them here, focusing on those most relevant to our goals in this book. Readers interested in pursuing the study of graph theory further might like to look at the books by Harary [230] or West [467].
- Location 51556
-
Networks and their representation
Table 6.1: Some examples of nodes and edges in networks
- Location 51993
-
Nodes and edges are also called sites and bonds in physics, and actors and ties in sociology.1 Edges may also be variously called links, connections, or interactions, depending on context.
- Location 51993
-
In the rare cases where there can be more than one edge between the same nodes we refer to those edges collectively as a multiedge.
- Location 51993
- blue,
Edges that connect nodes to themselves are called self-edges or self-loops.
- Location 51994
- blue,
A network that has neither self-edges nor multiedges is called a simple a special name given to networks with self-edges. They are just called ânetworks with self-edges.â network or simple graph. A network with multiedges is called a multigraph.
- Location 51994
- blue,
The adjacency matrix
The fundamental mathematical representation of a network is the adjacency matrix.
- Location 51994
- blue,
Figure 6.1: Two small networks. (a) A simple graph, i.e., one having no multiedges or self-edges. (b) A network with both multiedges and self-edges.
- Location 52429
-
Two points to note about the adjacency matrix are, first, that for a network such as this with no self-edges the diagonal matrix elements are all zero, and, second, that the matrix is symmetric, since if there is an edge between i and j then there is necessarily an edge between j and i.
- Location 52431
-
A multiedge is represented by setting the corresponding matrix element Aij equal to the multiplicity of the edge. For example, a double edge between nodes i and j is represented by Aij Aji 2.
- Location 52431
-
A single self-edge from node i to itself is represented by setting the corresponding diagonal element Aii of the ... matrix equal to 2. Why 2 and not 1? Essentially, it is because a self-edge from i to i has two ends, both of which are connected to node i. As we will see, many mathematical results concerning the adjacency matrix work out more neatly if the matrix is defined this way, and thus it has become the accepted definition.2
- Location 52431
-
To give an example, the adjacency matrix for the multigraph in Fig. 6.1b is
- Location 52867
-
One can also have multiple self-edges (or âmulti-self-edgesâ perhaps). Such edges are represented by setting the corresponding diagonal element of the adjacency matrix equal to twice the multiplicity of the edge: Aii 4 for a double self-edge, 6 for a triple, and so forth.
- Location 52867
-
Weighted networks
Such weighted or valued networks can be represented mathematically by an adjacency matrix with the elements Aij equal to the weights of the ... corresponding connections. Thus, the adjacency matrix A ©  « 0 2 1 2 0 0.5 1 0.5 0 ÂȘ Âź ÂŹ (6.4) represents a weighted network in which the connection between nodes 1 and 2 is twice as strong as that between 1 and 3, which in turn is twice as strong as that between 2 and 3. ... undirected case. With this choice, formulas and results involving the adjacency matrix work out most neatly.
- Location 52868
-
The weights in a weighted network are usually positive numbers, but there Networks with both positive and negative edges are discussed further in Section 7.5 when we consider the concept of structural balance. is no reason in theory why they could not be negative. Social networks of relations between people, for example, are sometimes constructed in which positive edge weights denote friendship or other cordial relationships and negative ones represent animosity.
- Location 53305
-
And if edges can have weights on them, it is not a huge leap to consider weights on nodes too, or more exotic types of values on either edges or nodes, such as vectors or categorical variables like colors.
- Location 53306
-
Directed networks
A directed network or directed graph, also called a digraph for short, is a network in which each edge has a direction, pointing from one node to another.
- Location 53740
- blue,
In general the adjacency matrix of a directed network is asymmetric, since the existence of an edge from i to j does not necessarily imply that there is also an edge from j to i.
- Location 53742
-
An important distinction, however, is that self-edges in a directed network are represented by setting the corresponding diagonal element to 1, not 2 as in the
- Location 53742
- self-edges, directed networks,
Acyclic networks
A cycle in a directed network is a closed loop of edges with the arrows on each of the edges pointing the same way around the loop.
- Location 54178
- blue,
Some directed networks, however, have no cycles and these are called acyclic networks.4,5 A self-edgeâan edge connecting a node to itselfâcounts as a cycle, so acyclic networks also have no self-edges.
- Location 54178
- blue,
So a simple algorithm for determining whether a network is acyclic is: 1. Find a node with no outgoing edges. 2. If no such node exists, the network is cyclic. Otherwise, if such a node does exist, remove it and all its ingoing edges from the network. 3. If all nodes have been removed, the network is acyclic. Otherwise, go back to step 1.
- Location 55052
-
Figure 6.4: A hypergraph and corresponding bipartite graph. These two networks convey the same informationâthe membership of five nodes in four different groups. (a) The hypergraph representation in which the groups are represented as hyperedges, denoted by the loops circling sets of nodes. (b) The bipartite representation in which we introduce four new nodes (open circles at the top) representing the four groups, with edges connecting each of the original five nodes (bottom) to the groups to which it belongs.
- Location 55488
-
Hypergraphs
Families can have more than two people in them and one way to represent such families is to use a generalized type of edge that joins more than two nodes. Such an edge is called a hyperedge and a network with hyperedges is called a hypergraph. 8
- Location 55489
- blue, hypergraph, hyperedge, 1evernote,
Bipartite networks
A bipartite network, also called a two-mode network in the sociology literature, is a network with two kinds of nodes, and edges that run only between nodes We discussed bipartite networks previously in the context of recommender networks in Section 3.3.2, affiliation networks in Section 4.5, and metabolic networks in Section 5.1.1. of different kindsâsee Fig. 6.5. Bipartite networks are most commonly used to represent the membership of a set of people or objects in groups of some kind. The people are represented by one set of nodes, the groups by the other, and edges join the people to the groups to which they belong. When used in this way a bipartite network captures exactly the same information as the hypergraphs of Section 6.5 (see Fig. 6.4) but for most purposes the bipartite graph is more convenient and it is certainly more widely used.
- Location 55926
- blue,
The incidence matrix and network projections
In some cases we would prefer to work with a network with only one type of nodeâa network of people alone, for instance, without the group nodes. One way to create such a network is to get rid of the group nodes and directly join together any two people who belong to the same group, creating a so-called one-mode projection of the two-mode bipartite form.
- Location 56364
- blue,
Figure 6.6: The two one-mode projections of a bipartite network. The central portion of this figure shows a bipartite network with four nodes of one type (open circles labeled A to D) and seven of another (filled circles, 1 to 7). At the top and bottom we show the one-mode projections of the network onto the two sets of nodes.
- Location 56799
-
When we form a one-mode projection, each group in the bipartite network results in a cluster of nodes in the projected network that are all connected to each otherâa âcliqueâ in network jargon (see Section 7.2.1).
- Location 56800
-
Projections are useful and widely employed, but their construction discards a lot of the information present in the original bipartite network and hence they are, in a sense, a less powerful representation of our data. For example, a projection loses any information about how many groups two nodes share in common. ... We can add information of this kind to our projection by making the projection weighted, giving each edge between two nodes in the projected network a weight equal to the number of common groups the nodes share. This weighted network still does not capture all the information in the bipartite originalâit doesnât record the total number of groups or the exact membership of each group for instanceâbut it is an improvement on the unweighted version and is quite widely used.
- Location 56800
-
Multilayer and dynamic networks
Figure 6.7: Multilayer and multiplex networks. (a) A multilayer network consists of a set of layers, each containing its own network, plus interlayer edges connecting nodes in different layers (dashed lines). An example is a transportation network with layers corresponding to airlines, trains, buses, and so forth. (b) A multiplex network is a special case of a multilayer network in which the nodes represent the same set of objects or people in each layer. For instance, a social network with several different types of connections could be represented as a multiplex network with one layer for each type. Dynamic or temporal networks are another example, where the layers represent snapshots over time of the structure of a single, time-varying network. In principle one can include interlayer edges in a multiplex network, as here, to represent the equivalence of nodes in different layers, although in practice these are often omitted.
- Location 57672
-
A multilayer network is a set of individual networks, each representing nodes of one particular type and their connections, plus interlinking edges between networksâsee Fig. 6.7a. The individual networks are referred to as layers. Thus, in our transportation example there might be a layer representing airports and flights, a layer representing train stations and train routes, and so on. Connections between layers could then be used to join nodes that are in the same geographical location, or at least close enough for easy walking. Many airports have train stations, for instance, and many train stations have bus stops. Paths through the resulting multilayer transportation network then represent possible passenger journeys: a passenger catches the bus to the train station for instance (represented by an edge in the bus route layer), walks from the bus stop into the station (an interlayer edge), then takes a train to their destination (an edge in the train route layer).
- Location 57673
- blue,
An important special case of a multilayer network occurs when the nodes in each layer represent the same set of points, objects, or individuals. Such networks, which are called multiplex networks, represent systems in which there ... is only one type of node but more than one type of edge. An archetypal example is a social network, in which the nodes represent people, but there are many different kinds of connections between themâfriendship connections, family connections, business connections, and so forthâeach represented by a separate layer.
- Location 57674
- blue,
Another special case of a multilayer network is a dynamic ortemporal network, a network whose structure changes over time. Most networks in fact do change over timeâthe Internet, the Web, social networks, neural networks, ecological networks, and many others change on a range of different time-scales.
- Location 58110
- blue,
The usual approach is to measure the structure of the network repeatedly at distinct time intervals, resulting in a sequence of snapshots of the system, individual networks that can be thought of collectively as a multilayer network in which the layers have a specific ordering in time. In some examples only the edges change over time and not the nodes, in which case we have a multiplex network. In others the nodes can also appear or disappear, in which case a full multilayer network is needed to capture the structure, with different sets of nodes in different layers and interlayer edges between consecutive layers to indicate which nodes are equivalent.
- Location 58110
-
Mathematically, a multiplex network can be represented by a set of n à n adjacency matrices Aα, one for each layer α (or each time point in the case of a dynamic network). Equivalently, one can think of the elements A α ij of these matrices as forming a three-dimensional tensor, and tensor analysis methods can usefully be applied to multiplex networks [264].
- Location 58112
- multiplex network,
There have been a number of studies of transportation networks of the type discussed above, which are true multilayer networks [135, 198], and a range of other examples can be found in the literature. We refer the interested reader to the reviews by Boccaletti et al. [67], De Domenico et al. [134], and KivelÀ et al. [264], and the book by Bianconi [60].
- Location 58547
- multilayer networks,
Trees
A treeis a connected, undirected network that contains no loopsâsee Fig. 6.8a.9 By âconnectedâ we mean that every node in the network is reachable from every The disconnected parts of a network are called âcomponentsââsee Section 6.12. other via some path through the network. A network can also consist of two or more parts, disconnected from one another, and if an individual part has no loops it is also called a tree. If all the parts of the network are trees, the complete network is called a forest.
- Location 58547
- trees, anki, forests, blue,
Trees also occur commonly in computer science, where they are used as a basic building block for data structures such as AVL trees and heaps [9, 122] and in other theoretical contexts like minimum spanning trees [122], Cayley trees or Bethe lattices [388], and hierarchical models of networks (see Sections 14.7.2 and 18.3.2 and Refs. [109, 268, 465]).
- Location 58984
-
Perhaps the most important property of trees for our purposes is that, since they have no closed loops, there is exactly one path between any pair of nodes. (In a forest there is at most one path, but there may be none.)
- Location 58984
-
Another important property of trees is that a tree of n nodes always has exactly n â 1 edges.
- Location 59420
-
The reverse is also true, that any connected network with n nodes and n â1 edges is a tree. If such a network were not a tree then there must be a loop in the network somewhere, implying that we could remove an edge without disconnecting any part of the network.
- Location 59421
-
Planar networks
A planar network is a network that can be drawn on a plane without having any edges cross.11 Figure 6.9a shows a small planar network. Note that it is in most cases also possible to draw a planar network so that some edges do cross (Fig. 6.9b). The definition of planarity only specifies that at least one arrangement of the nodes exists that results in no crossing.
- Location 59422
- blue,
Figure 6.9: Two drawings of a planar network. (a) A small planar network with four nodes and six edges. It is self-evident that the network is planar, since in this depiction it has no edges that cross. (b) The same network redrawn with two of its edges crossing. Even though the edges cross, the network is still planarâa network is planar if it can be drawn without crossing edges. How you actually draw it is up to you.
- Location 59857
-
What we would really like is some measure of the degree of planarity of a network, a measure that could tell us, for example, that the road network is 99% planar, even though there are a few bridges or tunnels here and there. One possible such measure is the minimum number of edge crossings with which the network can be drawn. This, however, would be a difficult quantity to calculate since, at least in the simplest approach, its evaluation would require us to try every possible way of drawing the network, of which there are an impossibly large number for all but the smallest of networks. Perhaps another approach would be to look at the number of occurrences of K5 or UG in the network. So far, however, no widely accepted metric for degree of planarity has emerged. If such a measure were to gain currency it might well find occasional use in the study of real-world networks.
Degree
- Location 60733
-
The degree of a node in an undirected network is the number of edges connected to itâsee Fig. 6.11. In a social network of friendships between individuals, for instance, a personâs degree is the number of friends they have.
- Location 60733
- blue,
Occasionally we will come across networks in which all nodes have the same degree. In graph theory, such networks are called regular graphs or regular networks. A regular network in which all nodes have degree k is sometimes called k-regular. An example of a regular network is a periodic lattice such as a square or triangular lattice. On the square lattice, for instance, every node has degree four.
- Location 61605
- blue,
Density and sparsity
The maximum possible number of edges in a simple network (i.e., one with no multiedges or self-edges) is n 2 1 2 n(n â 1). The connectance or density Ï of a network is the fraction of those edges that are actually present:
- Location 61605
- blue,
Now consider a sequence of networks of increasing size n. If the density Ï remains non-zero as n becomes large the networks are said to be dense. In a dense network the fraction of non-zero elements in the adjacency matrix is non-vanishing in the limit of large n. A network where Ï â 0 in the limit of large n is said to be sparse, and the fraction of non-zero elements in the adjacency matrix tends to zero.
- Location 61606
- network density,
When we are working with theoretical models of networks, as we will in later chapters of the book, we can take the limit formally and state whether a network is sparse or dense, but in practical situations involving observed
- Location 61607
- c1,
networks we cannot do this. We cannot take the limit as an empirical metabolic network or food web becomes largeâwe are stuck with the network nature gives us. For such networks there is no formal sense in which they are either sparse or dense. Informally, on the other hand, one does often hear a network described, for example, as being sparse. Usually this just means that the value of Ï is small. In this qualitative sense, âsparseâ just means that most of the possible edges that could exist in the network are not present.
- Location 62041
-
- [note::Real-world networks have no formal definition for sparse v.s. dense - they are informally somewhere on the spectrum between sparse or dense.]
dg-publish: true
created: 2024-07-01
modified: 2024-07-01
title: Networks
source: kindle
@tags:: #litâ/đbook/highlights
@links:: networks, network theory,
@ref:: Networks
@author:: Mark Newman
=this.file.name
Reference
=this.ref
Notes
The book is divided into four parts. Following a short introductory chapter, Part I describes the basic types of networks studied by present-day science and the empirical techniques used to determine their structure. Part II introduces the fundamental tools used in the study of networks, including the mathematical methods used to represent network structure, measures and statistics for quantifying network structure, and computer algorithms for calculating those measures and statistics. Part III describes mathematical models of network structure that can help us predict the behavior of networked systems and understand their formation and growth. And Part IV describes applications of network theory, including models of network resilience, epidemics taking place on networks, and network search processes.
- Location 4370
-
network is, in its simplest form, a collection of points joined together in pairs by lines.
- Location 6117
- blue,
Introduction
In the nomenclature of the field a point is referred to as a node or vertex1 and a line is referred to as an edge.
- Location 6117
- blue,
What can we learn from networks?
Networks capture the pattern of interactions between the parts of a system.
- Location 8739
-
A network is a simplified representation that reduces a system to an abstract structure or topology, capturing only the basics of connection patterns and little else. The systems studied can, and often do, have many other interesting features not represented by the networkâthe detailed behaviors of individual nodes, such as computers or people, for instance, or the precise nature of the interactions between them.
- Location 8740
-
Properties of networks
On the other hand, direct visualization of networks is only really useful for networks up to a few hundreds or thousands of nodes, and for networks that are relatively sparse, meaning that the number of edges is quite small. If there are too many nodes or edges then pictures of the network will be too complicated for the eye to comprehend and their usefulness becomes limited.
- Location 9178
-
An example of a useful (and widely used) class of network metrics are the centrality measures. Centrality quantifies how important nodes are in a network,
- Location 9613
- blue,
The degree of a node in a network is the number of edges attached to it. In a social network of friendships, for instance, such as 2 4 2 1 3 The number beside each node in this small network indicates the nodeâs degree. the network of Fig. 1.2, the degree of an individual is the number of friends he or she has within the network.
- Location 9613
- blue,
In many cases the nodes with the highest degrees in a network, those with the most connections, also play major roles in the functioning of the system, and hence degree can be a useful guide for focusing our attention on the systemâs most important elements.
- Location 9614
-
In undirected networks degree is just a single number, but in directed networks nodes have two different degrees, in-degree and out-degree, corresponding to the number of edges pointing inward and outward respectively.
- Location 9614
- blue,
A further observation concerning degree is that many networks are found to contain a small but significant number of âhubsâânodes of unusually high degree. Social networks, for instance, often contain a few individuals with Hubs are discussed further an unusually large number of acquaintances. The Web has a small fraction in Section 10.3. of websites with a very large number of links.
- Location 9615
- network hubs, 1evernote, degree centrality,
Although first studied in the context of friendship networks, this small-world effect appears to be widespread, occurring in essentially all types of networks.
- Location 10050
-
A third example of a network phenomenon of practical importance is the Community structure in occurrence of clusters or communities in networks. We are most of us familiar networks is discussed in detail in Chapter 14. with the idea that social networks break up into subcommunities.
- Location 10052
-
Technological networks
The Internet
Figure 2.1: A schematic depiction of the structure of the Internet. The nodes and edges of the Internet fall into a number of different classes: the backbone of high-bandwidth long-distance connections; the ISPs, who connect to the backbone and who are divided roughly into regional (larger) and local (smaller) ISPs; and the end usersâhome users, companies, and so forthâwho connect to the ISPs.
- Location 12671
-
If there were a central Internet government with a complete map of the system, then the job of determining the network structure would be easyâone would just look at the map. But there is no such organization and no such map. Instead the networkâs structure must be determined by experimental measurements. There are two primary methods for doing this. The first uses âtracerouteâ; the second uses BGP.
- Location 13109
-
Measuring Internet structure using traceroute
We can use traceroute (or a similar tool) to probe the network structure of the Internet. The idea is to assemble a large data set of traceroute paths between many different pairs of points on the Internet. With luck, most of the edges in the network (though usually not all of them) will appear in at least one of these paths, and the combination of all of them together should give a reasonably complete picture of the network.
- Location 13545
-
A subnet is a group of IP addresses defined as follows. IP addresses consist of four numbers, each one in the range from 0 to 255 (eight bits in binary) and typically written in a string separated by periods or dots.4 For example, the IP address of the main web server at the authorâs home institution, the University of Michigan, is 141.211.243.44. IP addresses are allocated to organizations in blocks. The University of Michigan, for instance, owns (among others) all the addresses of the form 141.211.243.xxx, where âxxxâ can be any number between 0 and 255. Such a block, where the first three numbers in the address are fixed and the last can be anything, is called a class C subnet. There are also class B subnets, which have the form 141.211.xxx.yyy, and class A subnets, which have the form 141.xxx.yyy.zzz.
- Location 14420
- blue,
A domain is a group of computers and routers under, usually, the control of a single organization and identified by a single domain name, normally the last two or three parts of a computerâs address when the address is written in human-readable text form
- Location 14856
- blue,
Measuring Internet structure using routing tables
we can construct a picture of the Internet at the autonomous system level by examining them. The process is similar to that for the traceroute method described in the previous section and depicted in Fig. 2.2. We must first obtain a set of router tables, which is normally done simply by asking router operators for access to their tables. Each router table contains a large number of paths starting from a single source (the router), and the union of the paths from many routers gives a good, though not complete, network snapshot in which the nodes are autonomous systems and the edges are the connections between autonomous systems. As with traceroute, it is important that the routers used be widely distributed across the network to avoid too much duplication of results, and the number of routers should be as large as possible to make the sampling of network edges as complete as possible.
- Location 15294
-
tools now exist for determining, at least approximately, the geographic location of a given IP address, domain, or autonomous system. Examples include NetAcuity, IP2Location, MaxMind, and many others. Geographic locations are determined primarily by looking them up in one of several registries that record the official addresses of the registered owners of IP addresses, domains, or autonomous systems. These addresses need not in all cases match the actual location of the corresponding computer hardware. For instance, the domain ibm.com is registered in New York City, but IBMâs principal operations are in California. Nonetheless, an approximate picture of the geographic distribution of the Internet can be derived by these methods, and there has been some interest in the results [477].
- Location 16168
-
- [note::What kinds of interesting use cases are there for combining network topologies and geographic maps? Supply chain, power generation, & airports come to mind.]
The telephone network
discrete packets the way Internet data are (though there are exceptionsâsee below). The telephone network is a circuit-switched network, which means that the telephone company has a number of lines or circuits available to carry telephone calls between different points and it assigns them to individual callers when those ... Figure 2.4: A sketch of the three-tiered structure of a traditional telephone network. Individual subscriber telephones are connected to local exchanges, which are connected in turn to long-distance offices. The long-distance offices are connected among themselves by trunk lines, and there may be some connections between local exchanges as well. callers place phone calls.
- Location 16605
-
Power grids
while there is a temptation to apply network models of the kind described in this book to try to explain the behavior of power grids, it is wise to be cautious. Power grids are complicated systems. The flow of power is governed not only by geometry and simple physical laws, but also by detailed control of the phases and voltages across transmission lines, monitored and adjusted on rapid timescales by sophisticated computer systems and on slower timescales by human operators. There is evidence to suggest that network topology has only a relatively weak effect on power failures and other powergrid phenomena, and that good prediction and modeling of power systems requires more detailed information than can be gleaned from a network representation alone [234, 378].
- Location 17915
-
- [note::Often, a multi-faceted approach is needed to fully understand a network (visualizing the network topology simply doesn't cut it)]
Transportation networks
Sen et al. argue, plausibly, that in the context of rail travel what matters to most people is whether there is a direct train to their destination or, if there is not, how many trains they will have to take to get there. People do not care so much about how many stops there are along the way, so long as they donât have to change trains. Thus, Sen et al. argue, a useful network representation in the case of rail travel is one in which the nodes represent locations and two nodes are connected by an edge if a single train runs between them. Then the distance between two nodes in the networkâthe number of edges you need to traverse to get from A to Bâis equal to the number of trains you would have to take. A better representation still (although Sen et al. did not consider it) would be a âbipartite network,â a network containing two types of node, one representing the locations and the other representing train routes. Edges in the network would then join locations to the routes that run through them.
- Location 18351
-
- [note::This underscores the importance of thinking carefully about how you might represent a network. It's possible that the network topology you come up with doesn't represent what people actually care about or what factors actually influence the behavior of the system or the agents within it.]
Delivery and distribution networks
Figure 2.6: Drainage basin of the Loess Plateau. The network of rivers and streams on the Loess Plateau in the Shanxi province of China. The tree-like structure of the network is clearly visibleâthere are no loops in the network, so water at any point in the network drains offthe plateau via a single path.
- Location 19224
-
Networks of information
The World Wide Web
At that time there were several similar information systems competing for dominance of the rapidly growing Internet, but the Web won the battle, largely because its inventors decided to give away for free the software technologies on which it was basedâthe Hypertext Markup Language (HTML) used to specify the appearance of pages and the Hypertext Transport Protocol (HTTP) used to transmit pages over the Internet.
- Location 20100
-
- [note::The power of open standards strikes again!]
In its simplest Breadth-first search is dis- form, the crawler performs a so-called breadth-first search on the web network, cussed at length in Section 8.5. as shown schematically in Fig. 3.2. One starts from any initial web page, downloads the text of that page over the Internet, and finds all the links in the text. Functionally, a link consists of an identifying âtagââa short piece of text marking the link as a linkâand a Uniform Resource Locator, or URL, a standardized computer address that says how and where the linked web page can be found. By scanning for the tags and then copying the adjacent URLs a web crawler can rapidly extract URLs for all the links on a web page, storing them in memory or on a disk drive. When it is done with the current page, it takes one of the URLs from its store, uses it to locate a new page on the Web, and downloads the text of that page, and so the process repeats.
- Location 20536
-
- [note::Are there any existing crawlers that build a network diagram based on connections between pages? This is basically what I'm looking to do with my Zotero library.]
Since the Web is a directed network, not all pages are reachable from ... a given starting point. In particular, it is clear that pages that have no incoming hyperlinksâpages that no one links to at allâcan never be found by a crawler that follows links. Taking this idea one step further, it is also the case that a page will never be found if it is only linked to by pages that themselves have no incoming links.
- Location 20976
-
- [note::Perhaps Google should be called a "webpage finder" instead of a "search engine"]
In the case of the World Wide Web the giant outcomponent is estimated to occupy only about a half of all web pages and the other half of the Web is unreachable [84].2
- Location 21410
-
- [note::Wow. I would not have guessed 50%.]
There are a number of capable web crawler programs available for free on the Internet, including wget, Nutch, GRUB, and Sphinx. While most of us donât have the time or the facilities to crawl billions of web pages, these programs can be useful for crawling small sets of pages or single websites, and much useful insight and information can be acquired by doing so.
- Location 21411
- 1action,
- [note::Haven't heard of these before. Will have to check them out.]
Citation networks
the Science Citation Index (along with its sister publications, the Social Science Citation Index and the Arts and Humanities Citation Index) is now one of the primary and most widely used sources of citation data. ... Another database, Scopus, provides a competing but largely similar service.
- Location 21848
-
Perhaps the best known of these is Google Scholar, the academic literature arm of the Google search engine.
- Location 21849
-
Other automated citation indexing projects include Citebase, which indexes physics papers, and CiteseerX, which indexes computer science.
- Location 22283
-
There is, however, at least one important difference between a citation netAcyclic networks are dis- work and the Web: a citation network is acyclic, while the Web is not. An acyclic cussed further in Section 6.4.1. network is one in which there are no closed loops of directed edges. On the World Wide Web, it is entirely possible to follow a succession of hyperlinks and end up back at the page you started at. On a citation network, by contrast, this is essentially impossible. The reason is that in order to cite a paper, that paper must already have been written. One cannot cite a paper that does not yet exist. Thus all the directed edges in a citation network point backward in time, from See Fig. 6.3 for an illustra- newer papers to older ones.
- Location 22285
- blue,
Citation networks have some interesting statistics. For instance, one study ... found that about 47% of all papers have never been cited at all [404]. Of the remainder, 9% have one citation, 6% have two, and it goes down quickly after that. Only 21% of all papers have 10 or more citations, and just 1% have 100 or more. These figures are a consequence of the power-law degree distribution of the networkâsee Section 10.4.
- Location 22286
-
- [note::I'd be super interesting to understand how social network topology plays into this phenomenon. I suspect this is highly intractable, however.]
Citation networks of the type described so far are the simplest but not the only possible network representation of citation patterns. An alternative representation is the cocitation network. Two papers are said to be cocited if they are both cited by the same third paper. Cocitation is often taken as an indicator that papers deal with related topics and there is good evidence that this is a reasonable assumption in many cases.
- Location 22720
-
One can also define a weighted cocitation network in which the edges have varying strengths: the strength of an edge between two papers is equal to the number of other papers that cite both.
- Location 22721
-
Another related concept, although one that is less often used, is bibliographic coupling. Two papers are said to be bibliographically coupled if they cite the same other papers (rather than being cited by the same papers). Bibliographic coupling, like cocitation, can be taken as an indicator that papers deal with related material and one can define a strength or weight of coupling by the number of common citations between two papers.
- Location 22721
-
Patent and legal citations
Citations may highlight dependencies between technologies, such as one invention relying for its operation on another, but more often patent citations are âdefensive,â meaning that the inventor cites the patent for a related previous technology and then presents an argument for why the new technology is sufficiently different from the old one to merit its own patent.
- Location 23157
-
The structure of patent networks reflects the organization of human technology in much the same way that the structure of academic citation networks reflects the organization of research knowledge.
- Location 23159
-
There are a number of interesting technological and legal questions, for instance concerning originality of patented inventions, emerging technologies, and antitrust policy, that can be addressed by examining patent citation networks [106, 161, 216, 247].
- Location 23594
-
- [note::I suspect that using patent networks to predict the trajectory of emerging technology is already a well-trodden field, but It might be good to verify this.]
In countries where law cases can be decided by judges rather than juries, such as civil cases or appeals in Europe or the US, a judge will frequently issue an âopinionâ after deciding a case, a narrative essay explaining his or her reasoning and conclusions. It is common practice in writing such an opinion to cite previous opinions issued in other cases in order to establish precedent, or occasionally to argue against it. Thus, like academic papers and patents, legal opinions form a citation network, with opinions being the nodes and citations being the directed edges, and again the network is approximately acyclic.
- Location 23594
-
- [note::How could legal citation networks be leveraged in animal law contexts?]
in recent years these indexes have made the jump to electronic form and are now available online. In the United States, for instance, two commercial services, LexisNexis and Westlaw,7 provide detailed data on legal opinions and their citations. In the past few years a number of studies of the structure of legal citation networks have been published using data derived from these services [186, 187, 295, 314].
- Location 23595
-
In principle it would be possible also to construct networks of cocitation or bibliographic coupling between either patents or legal opinions, but we are not aware of any studies yet published of such networks.
- Location 23595
-
- [note::What are the main advantages of these approaches in the context of legal networks?]
Other information networks
Peer-to-peer networks
Peer-to-peer file-sharing networks (sometimes abbreviated P2P) are a widely used form of computer network that combines aspects of information networks and technological networks. A peer-to-peer network is a network in which the nodes are computers containing information in the form, usually, of discrete files, and the edges between them are virtual links established for the purpose of sharing the contents of those files. The links exist only in softwareâthey indicate only the intention of one computer to communicate with another should the need arise.
- Location 24030
-
Recommender networks
Recommender networks represent peopleâs preferences for things, such as for certain products sold by a retailer. Online merchants, for instance, may keep records of which customers bought which products and sometimes ask them whether they liked the products. Many large supermarket chains record the purchases made by their regular customers (usually identified by a small card with a barcode on it that is scanned when purchases are made) and so can work We encountered bipartite out which products each customer buys frequently.
- Location 24904
-
Keyword indexes
Another type of information network, also bipartite in form, is the keyword index. An example is the index at the end of this book, which consists of a list of words or phrases, each accompanied by the numbers of the pages on which related information can be found. An index of this kind can be represented as a bipartite network, with two types of nodes representing words and pages, and an edge connecting each word to the pages on which it appears. In addition to their use in books, keyword indexes are routinely constructed as guides to other information collections, including sets of academic papers and the World Wide Web.
- Location 25342
-
- [note::Huh, never would've thought about this as a type of network! Interesting way to think about it.]
In the context of document retrieval, the classic method for determining document similarity and performing generalized searches of this type is latent semantic analysis, which is based on the application of the matrix technique known as singular value decomposition to the bipartite network of keywords and documents [288]. A number of other competing methods have also been developed in recent years, using techniques such as non-negative matrix factorization [291, 292], latent Dirichlet allocation [63], and other probabilistic approaches [236].
- Location 25779
- latent dirichlet allocation, latent semantic analysis, negative matrix factorization,
Social networks
a social network is any network in which the nodes represent people and the edges represent some form of connection between them, such as friendship.
- Location 26215
- blue,
The empirical study of social networks
One of the most important things to appreciate about social networks is that there are many different possible definitions of an edge in such a network and the particular definition one uses will depend on what questions one is interested in answering. Edges might represent friendship between individuals, but they could also represent professional relationships, exchange of goods
- Location 26654
- c1,
or money, communication patterns, romantic or sexual relationships, or many other types of connection.
- Location 27089
-
Direct questioning of experimental subjects is probably the most common method of determining the structure of social networks. We discuss it in detail in Section 4.2.
- Location 27089
-
Such is the power of social network analysis that its techniques have, since the time of Moreno and Davis et al., been applied to an extraordinary variety of different communities, issues, and problems [79]: friendship and acquaintance patterns in local communities and in the population at large [54,55,261,333,447, 452] and among university students [446,479] and schoolchildren [169,338,400]; ... contacts between business people and other professionals [117, 197]; boards of directors of companies [130, 131, 318]; collaborations of scientists [218, 219, 349], movie actors [20, 466], and musicians [206]; sexual contact networks [271, 305, 392, 411, 417] and dating patterns [52, 238]; covert and criminal networks such as networks of drug users [421] or terrorists [282]; historical networks [51, 377]; online communities such as Usenet [313, 431, 449] and Facebook [278, 298, 446, 452]; and social networks of animals [180, 315, 418, 419].
- Location 27092
- social network analysis, 1resource/paper,
Figure 4.2: The affiliation network of the âSouthernWomen Study.â This network (like all affiliation networks) has two types of node, the open circles at the bottom representing the 18 women who were the subjects of the study and the shaded circles at the top representing the social events they attended. The edges connect each woman to the events she attended. Data courtesy of L. Freeman and originally from Davis et al. [129].
- Location 27526
-
- [note::Reminds me of the molecule/drug development study I witnessed at 2024 Penn Grad Talks. Or maybe it was the one about the Russian drug market?]
Interviews and questionnaires
The most common general method for gathering data on social networks is simply to ask people questions. If you are interested in friendship networks, you ask people who their friends are. If you are interested in business relationships you ask people who they do business with, and so forth. The asking may take the form of direct interviews with participants or the completion of questionnaires, either on paper or electronically.
- Location 27963
-
- [note::I believe this is what was done by the author of the network stewardship book I read]
By using a questionnaire, the designers of the study can guarantee that questions are asked, to a good approximation, in a consistent order and with consistent wording. By employing an interviewer to do the asking the study gains flexibility and reliabilityâinterviewees often take studies more seriously when answering questions put to them by a human beingâand interviewers may be given some latitude to probe interviewees when they are unclear, unresponsive, or confused. These are important considerations, since misunderstanding and inconsistent responses to survey questions are substantial sources of error [320].
- Location 27963
-
- [note::This speaks to the fundamental challenge of social network analysis and network stewardship: accurately representing the structure of the network]
A good introduction to social survey design and implementation is given by Rea and Parker [403].
- Location 27964
-
To measure social networks, surveys typically employ a name generator, a question or series of questions that invite respondents to name others with whom they have contact of a specific kind. For example, in their classic study of friendship networks among schoolchildren, Rapoport and Horvath [400] asked children to complete a questionnaire that included items worded as follows: My best friend at Junior High School is: My second-best friend at Junior High School is: My third-best friend at Junior High School is: . . . My eighth-best friend at Junior High School is:
- Location 27964
-
This is a common issue: it is highly likely that any group of individuals surveyed will have at least some ties outside the group and one must decide what to do with these ties. Sometimes they are recorded. Sometimes, as here, they are not. Such details can be important since statistics derived from survey results will often depend on the decisions made.
- Location 28400
-
There are a number of points to note about the data produced by name generators. First, the network ties, friendships in the case above, are determined by one respondent nominating another. This is a fundamentally asymmetric process. Individual A identifies individual B as their friend. In many cases B will also identify A as their friend, but there is no guarantee that this will We encountered directed happen and it is not uncommon for nomination to go in only one direction. We networks previously in Chapter 1, in our discussion of the World Wide Web, and they are discussed in more detail in Section 6.4. normally think of friendship as a two-way relationship, but surveys suggest that this not always the case. As a result, data derived from name generators are often best represented as directed networks, networks in which edges run in a particular direction from one node to another. If two individuals nominate each other then we have two directed edges, one pointing in either direction.
- Location 28400
-
- [note::Networks produced via name generator surveys should often be represented as unidirectional, not bidirectional (since you can't assume that a given pair of Individuals will both consider each other friends 100% of the time)]
Each node in the network then has two degrees, an out-degreeâthe number of friends identified by the corresponding individualâand an in-degreeâthe number of others who identified the individual as a friend.
- Location 28401
- blue,
It is common, as in the example above, for the experimenter to place a limit on the number of names a respondent can give. In the study of Rapoport and Horvath this limit was eight. Studies that impose such a limit are called fixed choice studies. The alternative is a free choice study, which imposes no limit.
- Location 28402
- blue,
Not all surveys employing name generators produce directed networks. Sometimes we are interested in ties that are intrinsically symmetric between the two parties involved, in which case the edges in the network are properly represented as undirected. An example is networks of sexual contact, which are widely studied to help us understand the spread of sexually transmitted diseases [271,305,392,417]. In such networks a tie between individuals A and B means that A and B had sex.
- Location 28838
-
Surveys can and often do ask respondents not just to name those with whom they have ties but to describe the nature of those ties as well. For instance, questions may ask respondents to name people they both like and dislike, or to name those with whom they have certain types of contact, such as socializing together, working together, or asking for advice. For example, in a study of the social network of a group of medical doctors, Coleman et al. [117] asked respondents the following questions: Who among your colleagues do you turn to most often for advice? With whom do you most often discuss your cases in the course of an ordinary week? Who are the friends among your colleagues whom you see most often socially?
- Location 28839
-
A survey such as this, which asks about several types of interactions, ... effectively generates data on several different networks at onceâthe network of advice, the discussion network, and so forthâbut all built upon the same set of nodes. Networks such as this are sometimes called âmultilayerâ or âmultiplexâ Multilayer networks are networks.
- Location 28840
- blue,
Some of the most interesting results of social network The common tendency of studies concern the extent to which peopleâs choice of whom they associate people to associate with others who are similar to themselves in some way is called âhomophilyâ or âassortative mixing,â and we discuss it in detail in Sections 7.7 and 10.7. with reflects their own background and that of their associates. For instance, you might choose to socialize primarily with others of a similar age to yourself, but turn for advice to those who are older than you.
- Location 29274
-
- [note::Social network studies demonstrate the effect of homophily and other aspects of human behavior]
The main disadvantages of network studies based on direct questioning of participants are that they are first laborious and second inaccurate. The administering of interviews or questionnaires and the collation of the responses is a demanding job that has been only somewhat eased by the use of computers and online survey tools. As a result, most studies have been limited to a few tens or at most hundreds of respondentsâthe 34-node social network of Fig. 1.2 is a typical example. It is a rare study that contains more than a thousand actors, and studies such as the National Longitudinal Study of Adolescent Health,3 which compiled responses from over 90 000 participants, are very unusual and extraordinarily costly. Only a substantial public interest such as, in that case, the control of disease, can justify the expense of performing them.
- Location 29274
- network structure, network studies,
- [note::In the case of COVID, NOVID was an interesting use case - they used bluetooth to detect when one person was in close contact with another, allowing a real-time contract-tracing network to be built]
Data based on direct questioning may also be affected by biases of various kinds. Answers given by respondents are always to some extent subjective. If you ask people who their friends are, for instance, different people will interpret âfriendâ in different ways and thus give different kinds of answers, despite the best efforts of investigators to pose questions and record the answers in a uniform fashion. This problem is not unique to network studies. Virtually all social surveys suffer from such problems and a large body of expertise Experimental error in net- concerning techniques for dealing with them has been developed [320, 403].
- Location 29275
- pink,
Ego-centered networks
At the other end of the spectrum from sociometric studies lie studies of personal networks or ego-centered networks. 4 An ego-centered network is the network surrounding one particular individual, meaning that individual plus his or her immediate contacts. The individual in question is referred to as the ego and the contacts as alters.
- Location 29711
- blue,
An ego-centered network consisting of an ego and five alters.
- Location 29711
-
Typically, one constructs not just a single ego-centered network but several, centered on different egos drawn from the target population. In a telephone survey, for instance, one might call random telephone numbers in the target area and survey those who answer, asking them to identify others with whom they have a certain type of contact. Participants might also be asked to describe some characteristics both of themselves and of their alters, and perhaps to answer some other simple questions, such as which alters also have contact with one another.
- Location 29711
-
- [note::Is this the approach that was taken in that one study about identifying rumor spreaders in India to help promote social programs?]
Sometimes, however, we are primarily interested in local network properties, and ego-centered network studies can give us good data about these. For example, if we wish to know about the degrees of nodes in a networkâthe numbers of ties people haveâ then a study in which a random sample of people are each asked to list their contacts may give us everything we need. (Studies probing node degrees are discussed more below.) If we also gather data on contacts between alters, we can estimate clustering coefficients (see Section 7.3). If we have data on characteristics of egos and alters we can measure assortative mixing (Sections 7.7 and 10.7).
- Location 29712
-
Direct observation
One arena in which direct observation is essentially the only viable experimental technique is studies of the social networks of animals, since clearly animals cannot be surveyed using interviews or questionnaires. Not all animal species form interesting social networks, but informative studies have been performed of, among others, monkeys [180,418,419], kangaroos [214], and dolphins [121, 315].
- Location 31021
-
Data from archival or third-party records
Time-varying time-varying networks are sometimes called longitudinal studies, and you may occasionally encounter this term in the literature. networks have been the focus of increasing research attention in recent years. We discuss them further in Section 6.7.
- Location 31895
-
Studies of weblogs and journals have been performed, for example, by Adamic and Glance [4] and MacKinnon and Warren [317].
- Location 31897
-
Affiliation networks
An affiliation network is a network in which actors are connected via their membership in groups of some kind.
- Location 31898
- blue,
the most complete representation of an affiliation network is a network with two types of nodes representing the actors and the groups, with edges connecting actors to the groups to which they belongâsee Fig. 4.2 on page 50. In such a representation, We study bipartite networks in more detail in Section 6.6. called a âbipartite networkâ or âtwo-mode network,â there are no edges connecting actors directly to other actors or groups to other groups, only actors to groups.
- Location 32332
-
Also in the business domain, a number of studies have been conducted of the networks formed by the boards of directors of companies [130,131,318], where the actors are company directors and the groups are the boards on which they sit.
- Location 32333
-
The small-world experiment
Snowball sampling, contact tracing, and random walks
Techniques have been developed, however, for sampling these populations by making use of the social networks that connect their members together. The most widely used such technique is snowball sampling [162, 188, 445]. Note that, unlike the other experimental techniques discussed in this chapter, snowball sampling is not intended as a technique for probing the structure of social networks. Rather, it is a technique for studying hidden populations
- Location 34081
- c1, network structure, snowball sampling,
that relies on social networks for its operation. It is important to keep this distinction clear. To judge by the literature, some professional network scientists do not, a mistake that can result in erroneous conclusions and bad science.
- Location 34516
-
- [note::c2]
So investigators probe the population instead by getting some of its members to provide contact details for others. The typical survey starts offrather like a standard ego-centered network study (see Section 4.2.1). You find one initial member of the population of interest and interview them about themselves. Then, upon gaining their confidence, you invite them also to name other members of the target population with whom they are acquainted. Then you go and find those acquaintances and interview them in turn, asking them also to name further contacts, and so forth through a succession of âwavesâ of sampling. Pretty soon the process âsnowballsâ and you have a large sample of your target population to work with.
- Location 34517
-
particular, snowball sampling gives highly biased samples. In the limit of a large number of waves, snowball sampling samples actors non-uniformly with probability proportional to their âeigenvector centralityâ (see Section 7.1.2). In principle, knowing this, one could compensate for the non-uniformity by appropriately weighting the results, but in practice the limit of large number of waves is rarely reached, and in any case the eigenvector centrality cannot be calculated without knowledge of the complete contact network, which by definition we donât have, making correct weighting impossible. In short, snowball sampling gives biased samples of populations and there is little we can do about it. Nonetheless, the technique is sufficiently useful for finding populations that are otherwise hard to pin down that it has been widely used, biases and all, in studies over the past few decades.
- Location 34518
-
A technique closely related to snowball sampling is contact tracing, which is essentially a form of snowball sampling applied to disease incidence. Some diseases, such as tuberculosis and HIV, are considered in many countries to be sufficiently serious that, when someone is discovered to be carrying them, an effort must be made to track down all those who might also have been infected. Thus, when a patient tests positive for HIV, for instance, he or she will be questioned about recent sexual contacts, and possibly about other types of potentially disease-causing contacts, such as needle sharing if the patient is an injection drug user. Then health authorities will make an effort to track down the people so identified and test them for HIV also. The process is repeated with any who test positive, tracing their contacts as well, and so forth, until all leads have been exhausted.
- Location 34954
-
Data from contact tracing studies display biases similar in type and magnitude to those seen in snowball sampling and should be treated with the same caution. Indeed, they may contain extra biases as well, since contacts are rarely pursued when an individual tests negative for the disease in question, so the sample is necessarily dominated by carriers of the disease, who are themselves usually a biased sample of the population at large.
- Location 34955
-
There is another variant of snowball sampling that deals to some extent with the problems of bias in the sample. This is random-walk sampling [270, 445]. In this method one again starts with a single member of the target community and interviews them to determine their contacts. Then, however, instead of tracking down all of those contacts, one chooses one of them at random and interviews only that one at the next step. If the person in question cannot be found or declines to be interviewed, one chooses another contact, and the process is repeated.
- Location 34955
-
The advantage of the random-walk sampling method is that, as shown in Section 6.14.3, the asymptotic sampling probability of nodes in a random walk is simply proportional to node degree (see Eq. (6.44)). Whatâs more, the asymptotic regime in such studies is, unlike snowball sampling, reached quite quickly for relatively small sample sizes.8 Knowing this, and given that we determine degree (i.e., the number of contacts an individual has) as a part of the interview process, we can easily compensate for sampling bias by a suitable weighting of the results and make population estimates of quantities in a way that is, in theory at least, unbiased. In practice, many sources of bias remain, particularly those associated with participant subjectivity, inability to recall contacts, and non-participation of named contacts. Still, random-walk sampling is a significant improvement on standard snowball sampling. Its principal disadvantage is that it is relatively slow. Since the participants are interviewed serially, in a chain, rather than in parallel waves, a strict implementation of the method can take a long time to develop a large sample.
- Location 35390
-
In some cases it is considered unethical to get participants to name their contacts, particularly when the topic of a study is one of dubious legality, and permission to perform such studies may be withheld by the authorities. To circumvent this problem one can make use of respondent-driven sampling [421]. In this technique, participants are ... usually paid to take part, and enrollment is achieved by handing out tickets to interviewees. Rather than asking people to name their contacts, the interviewees are simply invited to give the tickets to their acquaintances, and told that both they and the acquaintances will receive payment if the acquaintance brings the ticket to the investigator and agrees to participate in the study. In this way, no one is ever asked to name names and all participants have actively volunteered their participation.
- Location 35392
-
In the case where a single ticket is given to each participant, the method is roughly equivalent to random-walk sampling and should in theory give a less biased sample than snowball sampling for the same reasons. In practice, a new bias is introduced because the recipient of the ticket is not necessarily chosen at random from an individualâs acquaintances. Also, tickets frequently get lost or their recipients decline to participate, remuneration notwithstanding, so one normally gives out more than one ticket to each participant, which complicates the sampling process and can introduce further biases [413].
- Location 35827
-
Biological networks
- Location 36264
- c1,
Biochemical networks
Metabolic networks
Proteinâprotein interaction networks
Genetic regulatory networks
Other biochemical networks
- Location 43256
-
Networks in the brain
Networks of neurons
- Location 44129
-
- [note::h3]
Networks of functional connectivity in the brain
Ecological networks
Food webs
A speciesâ position in this pecking order is called by ecologists its trophic level. Species at the bottom of the food web, of which there is just one in our exampleâthe phytoplanktonâhave trophic level 1. Those that prey on themâ krill, herbivorous planktonâhave trophic level 2, and so forth all the way up to the species at the top of the web, which have no predators at all.
- Location 48498
- blue,
Other ecological networks
The US National Center for Ecological Analysis and Synthesis has assembled an large collection of mutualistic network data in their Interaction Web Database, which can be found at http://www.nceas.ucsb.edu/interactionweb.
- Location 49809
- 1resource/data,
Mathematics of networks
Graph theory is a large field containing many results and we describe only a small fraction of them here, focusing on those most relevant to our goals in this book. Readers interested in pursuing the study of graph theory further might like to look at the books by Harary [230] or West [467].
- Location 51556
-
Networks and their representation
Table 6.1: Some examples of nodes and edges in networks
- Location 51993
-
Nodes and edges are also called sites and bonds in physics, and actors and ties in sociology.1 Edges may also be variously called links, connections, or interactions, depending on context.
- Location 51993
-
In the rare cases where there can be more than one edge between the same nodes we refer to those edges collectively as a multiedge.
- Location 51993
- blue,
Edges that connect nodes to themselves are called self-edges or self-loops.
- Location 51994
- blue,
A network that has neither self-edges nor multiedges is called a simple a special name given to networks with self-edges. They are just called ânetworks with self-edges.â network or simple graph. A network with multiedges is called a multigraph.
- Location 51994
- blue,
The adjacency matrix
The fundamental mathematical representation of a network is the adjacency matrix.
- Location 51994
- blue,
Figure 6.1: Two small networks. (a) A simple graph, i.e., one having no multiedges or self-edges. (b) A network with both multiedges and self-edges.
- Location 52429
-
Two points to note about the adjacency matrix are, first, that for a network such as this with no self-edges the diagonal matrix elements are all zero, and, second, that the matrix is symmetric, since if there is an edge between i and j then there is necessarily an edge between j and i.
- Location 52431
-
A multiedge is represented by setting the corresponding matrix element Aij equal to the multiplicity of the edge. For example, a double edge between nodes i and j is represented by Aij Aji 2.
- Location 52431
-
A single self-edge from node i to itself is represented by setting the corresponding diagonal element Aii of the ... matrix equal to 2. Why 2 and not 1? Essentially, it is because a self-edge from i to i has two ends, both of which are connected to node i. As we will see, many mathematical results concerning the adjacency matrix work out more neatly if the matrix is defined this way, and thus it has become the accepted definition.2
- Location 52431
-
To give an example, the adjacency matrix for the multigraph in Fig. 6.1b is
- Location 52867
-
One can also have multiple self-edges (or âmulti-self-edgesâ perhaps). Such edges are represented by setting the corresponding diagonal element of the adjacency matrix equal to twice the multiplicity of the edge: Aii 4 for a double self-edge, 6 for a triple, and so forth.
- Location 52867
-
Weighted networks
Such weighted or valued networks can be represented mathematically by an adjacency matrix with the elements Aij equal to the weights of the ... corresponding connections. Thus, the adjacency matrix A ©  « 0 2 1 2 0 0.5 1 0.5 0 ÂȘ Âź ÂŹ (6.4) represents a weighted network in which the connection between nodes 1 and 2 is twice as strong as that between 1 and 3, which in turn is twice as strong as that between 2 and 3. ... undirected case. With this choice, formulas and results involving the adjacency matrix work out most neatly.
- Location 52868
-
The weights in a weighted network are usually positive numbers, but there Networks with both positive and negative edges are discussed further in Section 7.5 when we consider the concept of structural balance. is no reason in theory why they could not be negative. Social networks of relations between people, for example, are sometimes constructed in which positive edge weights denote friendship or other cordial relationships and negative ones represent animosity.
- Location 53305
-
And if edges can have weights on them, it is not a huge leap to consider weights on nodes too, or more exotic types of values on either edges or nodes, such as vectors or categorical variables like colors.
- Location 53306
-
Directed networks
A directed network or directed graph, also called a digraph for short, is a network in which each edge has a direction, pointing from one node to another.
- Location 53740
- blue,
In general the adjacency matrix of a directed network is asymmetric, since the existence of an edge from i to j does not necessarily imply that there is also an edge from j to i.
- Location 53742
-
An important distinction, however, is that self-edges in a directed network are represented by setting the corresponding diagonal element to 1, not 2 as in the
- Location 53742
- self-edges, directed networks,
Acyclic networks
A cycle in a directed network is a closed loop of edges with the arrows on each of the edges pointing the same way around the loop.
- Location 54178
- blue,
Some directed networks, however, have no cycles and these are called acyclic networks.4,5 A self-edgeâan edge connecting a node to itselfâcounts as a cycle, so acyclic networks also have no self-edges.
- Location 54178
- blue,
So a simple algorithm for determining whether a network is acyclic is: 1. Find a node with no outgoing edges. 2. If no such node exists, the network is cyclic. Otherwise, if such a node does exist, remove it and all its ingoing edges from the network. 3. If all nodes have been removed, the network is acyclic. Otherwise, go back to step 1.
- Location 55052
-
Figure 6.4: A hypergraph and corresponding bipartite graph. These two networks convey the same informationâthe membership of five nodes in four different groups. (a) The hypergraph representation in which the groups are represented as hyperedges, denoted by the loops circling sets of nodes. (b) The bipartite representation in which we introduce four new nodes (open circles at the top) representing the four groups, with edges connecting each of the original five nodes (bottom) to the groups to which it belongs.
- Location 55488
-
Hypergraphs
Families can have more than two people in them and one way to represent such families is to use a generalized type of edge that joins more than two nodes. Such an edge is called a hyperedge and a network with hyperedges is called a hypergraph. 8
- Location 55489
- blue, hypergraph, hyperedge, 1evernote,
Bipartite networks
A bipartite network, also called a two-mode network in the sociology literature, is a network with two kinds of nodes, and edges that run only between nodes We discussed bipartite networks previously in the context of recommender networks in Section 3.3.2, affiliation networks in Section 4.5, and metabolic networks in Section 5.1.1. of different kindsâsee Fig. 6.5. Bipartite networks are most commonly used to represent the membership of a set of people or objects in groups of some kind. The people are represented by one set of nodes, the groups by the other, and edges join the people to the groups to which they belong. When used in this way a bipartite network captures exactly the same information as the hypergraphs of Section 6.5 (see Fig. 6.4) but for most purposes the bipartite graph is more convenient and it is certainly more widely used.
- Location 55926
- blue,
The incidence matrix and network projections
In some cases we would prefer to work with a network with only one type of nodeâa network of people alone, for instance, without the group nodes. One way to create such a network is to get rid of the group nodes and directly join together any two people who belong to the same group, creating a so-called one-mode projection of the two-mode bipartite form.
- Location 56364
- blue,
Figure 6.6: The two one-mode projections of a bipartite network. The central portion of this figure shows a bipartite network with four nodes of one type (open circles labeled A to D) and seven of another (filled circles, 1 to 7). At the top and bottom we show the one-mode projections of the network onto the two sets of nodes.
- Location 56799
-
When we form a one-mode projection, each group in the bipartite network results in a cluster of nodes in the projected network that are all connected to each otherâa âcliqueâ in network jargon (see Section 7.2.1).
- Location 56800
-
Projections are useful and widely employed, but their construction discards a lot of the information present in the original bipartite network and hence they are, in a sense, a less powerful representation of our data. For example, a projection loses any information about how many groups two nodes share in common. ... We can add information of this kind to our projection by making the projection weighted, giving each edge between two nodes in the projected network a weight equal to the number of common groups the nodes share. This weighted network still does not capture all the information in the bipartite originalâit doesnât record the total number of groups or the exact membership of each group for instanceâbut it is an improvement on the unweighted version and is quite widely used.
- Location 56800
-
Multilayer and dynamic networks
Figure 6.7: Multilayer and multiplex networks. (a) A multilayer network consists of a set of layers, each containing its own network, plus interlayer edges connecting nodes in different layers (dashed lines). An example is a transportation network with layers corresponding to airlines, trains, buses, and so forth. (b) A multiplex network is a special case of a multilayer network in which the nodes represent the same set of objects or people in each layer. For instance, a social network with several different types of connections could be represented as a multiplex network with one layer for each type. Dynamic or temporal networks are another example, where the layers represent snapshots over time of the structure of a single, time-varying network. In principle one can include interlayer edges in a multiplex network, as here, to represent the equivalence of nodes in different layers, although in practice these are often omitted.
- Location 57672
-
A multilayer network is a set of individual networks, each representing nodes of one particular type and their connections, plus interlinking edges between networksâsee Fig. 6.7a. The individual networks are referred to as layers. Thus, in our transportation example there might be a layer representing airports and flights, a layer representing train stations and train routes, and so on. Connections between layers could then be used to join nodes that are in the same geographical location, or at least close enough for easy walking. Many airports have train stations, for instance, and many train stations have bus stops. Paths through the resulting multilayer transportation network then represent possible passenger journeys: a passenger catches the bus to the train station for instance (represented by an edge in the bus route layer), walks from the bus stop into the station (an interlayer edge), then takes a train to their destination (an edge in the train route layer).
- Location 57673
- blue,
An important special case of a multilayer network occurs when the nodes in each layer represent the same set of points, objects, or individuals. Such networks, which are called multiplex networks, represent systems in which there ... is only one type of node but more than one type of edge. An archetypal example is a social network, in which the nodes represent people, but there are many different kinds of connections between themâfriendship connections, family connections, business connections, and so forthâeach represented by a separate layer.
- Location 57674
- blue,
Another special case of a multilayer network is a dynamic ortemporal network, a network whose structure changes over time. Most networks in fact do change over timeâthe Internet, the Web, social networks, neural networks, ecological networks, and many others change on a range of different time-scales.
- Location 58110
- blue,
The usual approach is to measure the structure of the network repeatedly at distinct time intervals, resulting in a sequence of snapshots of the system, individual networks that can be thought of collectively as a multilayer network in which the layers have a specific ordering in time. In some examples only the edges change over time and not the nodes, in which case we have a multiplex network. In others the nodes can also appear or disappear, in which case a full multilayer network is needed to capture the structure, with different sets of nodes in different layers and interlayer edges between consecutive layers to indicate which nodes are equivalent.
- Location 58110
-
Mathematically, a multiplex network can be represented by a set of n à n adjacency matrices Aα, one for each layer α (or each time point in the case of a dynamic network). Equivalently, one can think of the elements A α ij of these matrices as forming a three-dimensional tensor, and tensor analysis methods can usefully be applied to multiplex networks [264].
- Location 58112
- multiplex network,
There have been a number of studies of transportation networks of the type discussed above, which are true multilayer networks [135, 198], and a range of other examples can be found in the literature. We refer the interested reader to the reviews by Boccaletti et al. [67], De Domenico et al. [134], and KivelÀ et al. [264], and the book by Bianconi [60].
- Location 58547
- multilayer networks,
Trees
A treeis a connected, undirected network that contains no loopsâsee Fig. 6.8a.9 By âconnectedâ we mean that every node in the network is reachable from every The disconnected parts of a network are called âcomponentsââsee Section 6.12. other via some path through the network. A network can also consist of two or more parts, disconnected from one another, and if an individual part has no loops it is also called a tree. If all the parts of the network are trees, the complete network is called a forest.
- Location 58547
- trees, anki, forests, blue,
Trees also occur commonly in computer science, where they are used as a basic building block for data structures such as AVL trees and heaps [9, 122] and in other theoretical contexts like minimum spanning trees [122], Cayley trees or Bethe lattices [388], and hierarchical models of networks (see Sections 14.7.2 and 18.3.2 and Refs. [109, 268, 465]).
- Location 58984
-
Perhaps the most important property of trees for our purposes is that, since they have no closed loops, there is exactly one path between any pair of nodes. (In a forest there is at most one path, but there may be none.)
- Location 58984
-
Another important property of trees is that a tree of n nodes always has exactly n â 1 edges.
- Location 59420
-
The reverse is also true, that any connected network with n nodes and n â1 edges is a tree. If such a network were not a tree then there must be a loop in the network somewhere, implying that we could remove an edge without disconnecting any part of the network.
- Location 59421
-
Planar networks
A planar network is a network that can be drawn on a plane without having any edges cross.11 Figure 6.9a shows a small planar network. Note that it is in most cases also possible to draw a planar network so that some edges do cross (Fig. 6.9b). The definition of planarity only specifies that at least one arrangement of the nodes exists that results in no crossing.
- Location 59422
- blue,
Figure 6.9: Two drawings of a planar network. (a) A small planar network with four nodes and six edges. It is self-evident that the network is planar, since in this depiction it has no edges that cross. (b) The same network redrawn with two of its edges crossing. Even though the edges cross, the network is still planarâa network is planar if it can be drawn without crossing edges. How you actually draw it is up to you.
- Location 59857
-
What we would really like is some measure of the degree of planarity of a network, a measure that could tell us, for example, that the road network is 99% planar, even though there are a few bridges or tunnels here and there. One possible such measure is the minimum number of edge crossings with which the network can be drawn. This, however, would be a difficult quantity to calculate since, at least in the simplest approach, its evaluation would require us to try every possible way of drawing the network, of which there are an impossibly large number for all but the smallest of networks. Perhaps another approach would be to look at the number of occurrences of K5 or UG in the network. So far, however, no widely accepted metric for degree of planarity has emerged. If such a measure were to gain currency it might well find occasional use in the study of real-world networks.
Degree
- Location 60733
-
The degree of a node in an undirected network is the number of edges connected to itâsee Fig. 6.11. In a social network of friendships between individuals, for instance, a personâs degree is the number of friends they have.
- Location 60733
- blue,
Occasionally we will come across networks in which all nodes have the same degree. In graph theory, such networks are called regular graphs or regular networks. A regular network in which all nodes have degree k is sometimes called k-regular. An example of a regular network is a periodic lattice such as a square or triangular lattice. On the square lattice, for instance, every node has degree four.
- Location 61605
- blue,
Density and sparsity
The maximum possible number of edges in a simple network (i.e., one with no multiedges or self-edges) is n 2 1 2 n(n â 1). The connectance or density Ï of a network is the fraction of those edges that are actually present:
- Location 61605
- blue,
Now consider a sequence of networks of increasing size n. If the density Ï remains non-zero as n becomes large the networks are said to be dense. In a dense network the fraction of non-zero elements in the adjacency matrix is non-vanishing in the limit of large n. A network where Ï â 0 in the limit of large n is said to be sparse, and the fraction of non-zero elements in the adjacency matrix tends to zero.
- Location 61606
- network density,
When we are working with theoretical models of networks, as we will in later chapters of the book, we can take the limit formally and state whether a network is sparse or dense, but in practical situations involving observed
- Location 61607
- c1,
networks we cannot do this. We cannot take the limit as an empirical metabolic network or food web becomes largeâwe are stuck with the network nature gives us. For such networks there is no formal sense in which they are either sparse or dense. Informally, on the other hand, one does often hear a network described, for example, as being sparse. Usually this just means that the value of Ï is small. In this qualitative sense, âsparseâ just means that most of the possible edges that could exist in the network are not present.
- Location 62041
-
- [note::Real-world networks have no formal definition for sparse v.s. dense - they are informally somewhere on the spectrum between sparse or dense.]