Stitching Together Multiple Input and Output Plugins

The information you need to manage often comes from several disparate sources, and use cases can require multiple destinations for your data. Your Logstash pipeline can use multiple input and output plugins to handle these requirements.

In this section, you create a Logstash pipeline that takes input from a Twitter feed and the Filebeat client, then sends the information to an Elasticsearch cluster as well as writing the information directly to a file.

Reading from a Twitter Feed

To add a Twitter feed, you use the twitter input plugin. To configure the plugin, you need several pieces of information:

  • A consumer key, which uniquely identifies your Twitter app.
  • A consumer secret, which serves as the password for your Twitter app.
  • One or more keywords to search in the incoming feed. The example shows using "cloud" as a keyword, but you can use whatever you want.
  • An oauth token, which identifies the Twitter account using this app.
  • An oauth token secret, which serves as the password of the Twitter account.

Visit https://dev.twitter.com/apps to set up a Twitter account and generate your consumer key and secret, as well as your access token and secret. See the docs for the twitter input plugin if you're not sure how to generate these keys.

Like you did earlier when you worked on Parsing Logs with Logstash, create a config file (called second-pipeline.conf) that contains the skeleton of a configuration pipeline. If you want, you can reuse the file you created earlier, but make sure you pass in the correct config file name when you run Logstash.

Add the following lines to the input section of the second-pipeline.conf file, substituting your values for the placeholder values shown here:

复制代码
twitter {
        consumer_key => "enter_your_consumer_key_here"
        consumer_secret => "enter_your_secret_here"
        keywords => ["cloud"]
        oauth_token => "enter_your_access_token_here"
        oauth_token_secret => "enter_your_access_token_secret_here"
    }

Configuring Filebeat to Send Log Lines to Logstash

As you learned earlier in Configuring Filebeat to Send Log Lines to Logstash, the Filebeat client is a lightweight, resource-friendly tool that collects logs from files on the server and forwards these logs to your Logstash instance for processing.

After installing Filebeat, you need to configure it. Open the filebeat.yml file located in your Filebeat installation directory, and replace the contents with the following lines. Make sure paths points to your syslog:

复制代码
filebeat.inputs:
- type: log
  paths:
    - /var/log/*.log 
  fields:
    type: syslog 
output.logstash:
  hosts: ["localhost:5044"]

|---|------------------------------------------------------------------|
| | Absolute path to the file or files that Filebeat processes. |
| | Adds a field called type with the value syslog to the event. |

Save your changes.

To keep the configuration simple, you won't specify TLS/SSL settings as you would in a real world scenario.

Configure your Logstash instance to use the Filebeat input plugin by adding the following lines to the input section of the second-pipeline.conf file:

复制代码
beats {
        port => "5044"
    }

Writing Logstash Data to a File

You can configure your Logstash pipeline to write data directly to a file with the file output plugin.

Configure your Logstash instance to use the file output plugin by adding the following lines to the output section of the second-pipeline.conf file:

复制代码
file {
        path => "/path/to/target/file"
    }

Writing to Multiple Elasticsearch Nodes

Writing to multiple Elasticsearch nodes lightens the resource demands on a given Elasticsearch node, as well as providing redundant points of entry into the cluster when a particular node is unavailable.

To configure your Logstash instance to write to multiple Elasticsearch nodes, edit the output section of the second-pipeline.conf file to read:

复制代码
output {
    elasticsearch {
        hosts => ["IP Address 1:port1", "IP Address 2:port2", "IP Address 3"]
    }
}

Use the IP addresses of three non-master nodes in your Elasticsearch cluster in the host line. When the hosts parameter lists multiple IP addresses, Logstash load-balances requests across the list of addresses. Also note that the default port for Elasticsearch is 9200 and can be omitted in the configuration above.

Testing the Pipeline

At this point, your second-pipeline.conf file looks like this:

复制代码
input {
    twitter {
        consumer_key => "enter_your_consumer_key_here"
        consumer_secret => "enter_your_secret_here"
        keywords => ["cloud"]
        oauth_token => "enter_your_access_token_here"
        oauth_token_secret => "enter_your_access_token_secret_here"
    }
    beats {
        port => "5044"
    }
}
output {
    elasticsearch {
        hosts => ["IP Address 1:port1", "IP Address 2:port2", "IP Address 3"]
    }
    file {
        path => "/path/to/target/file"
    }
}

Logstash is consuming data from the Twitter feed you configured, receiving data from Filebeat, and indexing this information to three nodes in an Elasticsearch cluster as well as writing to a file.

At the data source machine, run Filebeat with the following command:

sudo ./filebeat -e -c filebeat.yml -d "publish"

Filebeat will attempt to connect on port 5044. Until Logstash starts with an active Beats plugin, there won't be any answer on that port, so any messages you see regarding failure to connect on that port are normal for now.

To verify your configuration, run the following command:

bin/logstash -f second-pipeline.conf --config.test_and_exit

The --config.test_and_exit option parses your configuration file and reports any errors. When the configuration file passes the configuration test, start Logstash with the following command:

bin/logstash -f second-pipeline.conf

Use the grep utility to search in the target file to verify that information is present:

grep syslog /path/to/target/file

Run an Elasticsearch query to find the same information in the Elasticsearch cluster:

curl -XGET 'localhost:9200/logstash-$DATE/_search?pretty&q=fields.type:syslog'

Replace $DATE with the current date, in YYYY.MM.DD format.

To see data from the Twitter feed, try this query:

curl -XGET 'http://localhost:9200/logstash-$DATE/_search?pretty\&q=client:iphone'

Again, remember to replace $DATE with the current date, in YYYY.MM.DD format.

相关推荐
失散136 小时前
分布式专题——47 ElasticSearch搜索相关性详解
java·分布式·elasticsearch·架构
铭毅天下9 小时前
Elasticsearch 到 Easysearch 数据迁移 5 种方案选型实战总结
大数据·elasticsearch·搜索引擎·全文检索
Elastic 中国社区官方博客12 小时前
Elasticsearch 推理 API 增加了开放的可定制服务
大数据·数据库·人工智能·elasticsearch·搜索引擎·ai·全文检索
yumgpkpm15 小时前
华为鲲鹏 Aarch64 环境下多 Oracle 数据库汇聚操作指南 CMP(类 Cloudera CDP 7.3)
大数据·hive·hadoop·elasticsearch·zookeeper·big data·cloudera
Elastic 中国社区官方博客16 小时前
AI Agent 评估:Elastic 如何测试代理框架
大数据·人工智能·elasticsearch·搜索引擎
Elasticsearch18 小时前
在 Elasticsearch 中解析 JSON 字段
elasticsearch
望获linux19 小时前
【实时Linux实战系列】使用 u-trace 或 a-trace 进行用户态应用剖析
java·linux·前端·网络·数据库·elasticsearch·操作系统
dessler19 小时前
Elasticsearch(ES)-Logstash
linux·运维·elasticsearch
云中隐龙20 小时前
mac使用本地jdk启动elasticsearch解决elasticsearch启动时jdk损坏问题
java·elasticsearch·macos
Elastic 中国社区官方博客1 天前
根据用户行为数据中的判断列表在 Elasticsearch 中训练 LTR 模型
大数据·数据库·人工智能·elasticsearch·搜索引擎·ai·全文检索