BackEnd21
https://drive.google.com/file/d/19hfwx_TQqHoNURyEpq4Si7Vpv2VmJ7sX/view?usp=sharing
Composer (eng)
- We put the correct docker according to the documentation https://docs.docker.com/machine/install-machine/
- Install docker-compose sudo apt install docker-compose
- We put virtualbox https://tecadmin.net/install-virtualbox-on-ubuntu-18-04/
- Set https://docs.confluent.io/current/quickstart/ce-docker-quickstart.html
- Set PostgreSQL https://www.digitalocean.com/community/tutorials/how-to-install-and-use-postgresql-on-ubuntu-18-04-ru
In PostgreSQL, we create the insikt database and restore the tables from the dump.
Keystore db
Also is needed restore the keystore db.
We need in our local restore it from a dump with
pg_restore -h {ip} -U postgres -W insikt1.sql -d insikt
Note: Database must be already created
- We collect images of containers, launching srun.sh in each In the backend directory, run make deploy
- After enabling the instance:
systemctl enable systemd-resolved
systemctl start systemd-resolved
- Check Eraser
https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html
sudo docker run -p 9200: 9200 -p 9300: 9300 -e “discovery.type = single-node” docker.elastic.co/elasticsearch/elasticsearch∗.6.15
- Adding an index to Elasticsearch
After this we need to go to Kibana though browser in IP_SERVER: 5601
Once Kibana is loaded go to the sidebar menu and click on Dev Tools.
You will see kibana console
Open kibana (/ app / kibana # / dev_tools / console? _G = ())
Dev Tools – Console
Insert script code
PUT demo
{
“settings”: {
“number_of_shards”: 6,
“number_of_replicas”: 1,
“analysis”: {
“analyzer”: {
“default”: {
“type”: “standard”,
“tokenizer”: “lowercase”,
“filter”: [
“asciifolding”
]
}
}
},
“index.requests.cache.enable”: true
},
“mappings”: {
“tweet”: {
“_source”: {
“enabled”: true
},
“properties”: {
“analysis”: {
“properties”: {
“threatScore”: {
“type”: “long”,
“doc_values”: true
},
“concepts”: {
“properties”: {
“concept”: {
“type”: “keyword”,
“index”: true,
“doc_values”: true
}
}
},
“docSentiment”: {
“type”: “double”,
“index”: true,
“doc_values”: true
},
“emotions”: {
“properties”: {
“emotion”: {
“type”: “keyword”,
“index”: true,
“doc_values”: true
}
}
},
“entities”: {
“type”: “nested”,
“properties”: {
“entity”: {
“type”: “keyword”,
“index”: true,
“doc_values”: true
},
“entityType”: {
“type”: “keyword”,
“index”: true,
“doc_values”: true
},
“type”: {
“type”: “keyword”,
“index”: true,
“doc_values”: true
}
}
},
“hashtags”: {
“properties”: {
“text”: {
“type”: “keyword”,
“index”: true,
“doc_values”: true
}
}
},
“keyIdeas”: {
“properties”: {
“keyIdea”: {
“type”: “keyword”,
“index”: true,
“doc_values”: true
}
}
},
“screenName”: {
“type”: “keyword”,
“index”: true,
“doc_values”: true
},
“topics”: {
“type”: “nested”,
“properties”: {
“topic”: {
“type”: “keyword”,
“index”: true,
“doc_values”: true
},
“category”: {
“type”: “keyword”,
“index”: true,
“doc_values”: true
}
}
}
}
},
“createdAt”: {
“type”: “date”,
“index”: true,
“doc_values”: true,
“format”: “dateOptionalTime”
},
“detectedLang”: {
“type”: “keyword”,
“index”: true,
“doc_values”: true
},
“geoLocation”: {
“properties”: {
“latitude”: {
“type”: “double”,
“index”: true,
“doc_values”: true
},
“longitude”: {
“type”: “double”,
“index”: true,
“doc_values”: true
}
}
},
“coordinates”: {
“index”: true,
“type”: “geo_point”
},
“geoname”: {
“properties”: {
“countryCode”: {
“type”: “keyword”,
“index”: true,
“doc_values”: true
},
“geonameid”: {
“type”: “integer”,
“index”: true,
“doc_values”: true
},
“name”: {
“type”: “keyword”,
“index”: true,
“doc_values”: true
}
}
},
“hashtagEntities”: {
“properties”: {
“end”: {
“type”: “long”,
“doc_values”: true
},
“start”: {
“type”: “long”,
“doc_values”: true
},
“text”: {
“type”: “keyword”,
“index”: false,
“doc_values”: true
}
}
},
“id”: {
“type”: “keyword”,
“index”: true,
“doc_values”: true
},
“idLong”: {
“type”: “long”,
“doc_values”: true
},
“mediaEntities”: {
“properties”: {
“end”: {
“type”: “long”,
“doc_values”: true
},
“mediaURL”: {
“type”: “keyword”,
“index”: false,
“doc_values”: true
},
“mediaURLHttps”: {
“type”: “keyword”,
“index”: false,
“doc_values”: true
},
“start”: {
“type”: “long”,
“doc_values”: true
}
}
},
“place”: {
“properties”: {
“boundingBoxCoordinates”: {
“properties”: {
“latitude”: {
“type”: “double”,
“index”: true,
“doc_values”: true
},
“longitude”: {
“type”: “double”,
“index”: true,
“doc_values”: true
}
}
},
“boundingBoxType”: {
“type”: “keyword”,
“index”: true,
“doc_values”: true
},
“country”: {
“type”: “keyword”,
“index”: true,
“doc_values”: true
},
“countryCode”: {
“type”: “keyword”,
“index”: true,
“doc_values”: true
},
“fullName”: {
“type”: “keyword”,
“index”: true,
“doc_values”: true
},
“id”: {
“type”: “keyword”,
“index”: true,
“doc_values”: true
},
“name”: {
“type”: “keyword”,
“index”: true,
“doc_values”: true
},
“placeType”: {
“type”: “keyword”,
“index”: true,
“doc_values”: true
},
“url”: {
“type”: “keyword”,
“index”: false,
“doc_values”: true
}
}
},
“retweetedStatus”: {
“properties”: {
“createdAt”: {
“type”: “date”,
“index”: true,
“doc_values”: true,
“format”: “dateOptionalTime”
},
“geoLocation”: {
“properties”: {
“latitude”: {
“type”: “double”,
“doc_values”: true
},
“longitude”: {
“type”: “double”,
“doc_values”: true
}
}
},
“hashtagEntities”: {
“properties”: {
“end”: {
“type”: “long”,
“doc_values”: true
},
“start”: {
“type”: “long”,
“doc_values”: true
},
“text”: {
“type”: “keyword”,
“index”: true,
“doc_values”: true
}
}
},
“id”: {
“type”: “keyword”,
“index”: true,
“doc_values”: true
},
“mediaEntities”: {
“properties”: {
“end”: {
“type”: “long”,
“doc_values”: true
},
“mediaURL”: {
“type”: “keyword”,
“index”: false,
“doc_values”: true
},
“mediaURLHttps”: {
“type”: “keyword”,
“index”: false,
“doc_values”: true
},
“start”: {
“type”: “long”,
“doc_values”: true
}
}
},
“place”: {
“properties”: {
“boundingBoxCoordinates”: {
“properties”: {
“latitude”: {
“type”: “double”,
“doc_values”: true
},
“longitude”: {
“type”: “double”,
“doc_values”: true
}
}
},
“boundingBoxType”: {
“type”: “keyword”,
“index”: false,
“doc_values”: true
},
“country”: {
“type”: “keyword”,
“index”: true,
“doc_values”: true
},
“countryCode”: {
“type”: “keyword”,
“index”: true,
“doc_values”: true
},
“fullName”: {
“type”: “keyword”,
“index”: true,
“doc_values”: true
},
“id”: {
“type”: “keyword”,
“index”: false,
“doc_values”: true
},
“name”: {
“type”: “keyword”,
“index”: true,
“doc_values”: true
},
“placeType”: {
“type”: “keyword”,
“index”: true,
“doc_values”: true
},
“url”: {
“type”: “keyword”,
“index”: false,
“doc_values”: true
}
}
},
“source”: {
“type”: “keyword”,
“index”: true
},
“symbolEntities”: {
“properties”: {
“end”: {
“type”: “long”,
“doc_values”: true
},
“start”: {
“type”: “long”,
“doc_values”: true
},
“text”: {
“type”: “keyword”,
“index”: true
}
}
},
“text”: {
“type”: “keyword”,
“index”: true
},
“urlEntities”: {
“properties”: {
“displayURL”: {
“type”: “keyword”,
“index”: false,
“doc_values”: true
},
“end”: {
“type”: “long”,
“doc_values”: true
},
“expandedURL”: {
“type”: “keyword”,
“index”: false,
“doc_values”: true
},
“start”: {
“type”: “long”,
“doc_values”: true
},
“url”: {
“type”: “keyword”,
“index”: false,
“doc_values”: true
}
}
},
“user”: {
“properties”: {
“createdAt”: {
“type”: “date”,
“doc_values”: true,
“format”: “dateOptionalTime”
},
“description”: {
“type”: “keyword”,
“index”: true
},
“favouritesCount”: {
“type”: “long”,
“index”: true,
“doc_values”: true
},
“followersCount”: {
“type”: “long”,
“index”: true,
“doc_values”: true
},
“friendsCount”: {
“type”: “long”,
“index”: true,
“doc_values”: true
},
“id”: {
“type”: “keyword”,
“index”: true,
“doc_values”: true
},
“lang”: {
“type”: “keyword”,
“index”: true,
“doc_values”: true
},
“location”: {
“type”: “keyword”,
“index”: true,
“doc_values”: true
},
“name”: {
“type”: “keyword”,
“index”: true,
“fields”: {
“raw”: {
“type”: “keyword”,
“index”: true,
“doc_values”: true
}
}
},
“profileImageUrl”: {
“type”: “keyword”,
“index”: false,
“doc_values”: true
},
“screenName”: {
“type”: “keyword”,
“index”: true,
“doc_values”: true
},
“statusesCount”: {
“type”: “long”,
“index”: true,
“doc_values”: true
},
“url”: {
“type”: “keyword”,
“index”: false,
“doc_values”: true
}
}
},
“userMentionEntities”: {
“properties”: {
“end”: {
“type”: “long”,
“doc_values”: true
},
“id”: {
“type”: “long”,
“doc_values”: true
},
“name”: {
“type”: “keyword”,
“index”: true
},
“screenName”: {
“type”: “keyword”,
“index”: true,
“doc_values”: true
},
“start”: {
“type”: “long”,
“doc_values”: true
}
}
}
}
},
“savedAt”: {
“type”: “date”,
“doc_values”: true,
“format”: “dateOptionalTime”
},
“source”: {
“type”: “keyword”,
“index”: true
},
“symbolEntities”: {
“properties”: {
“end”: {
“type”: “long”,
“doc_values”: true
},
“start”: {
“type”: “long”,
“doc_values”: true
},
“text”: {
“type”: “keyword”
}
}
},
“text”: {
“type”: “keyword”,
“index”: true
},
“unifiedText”: {
“type”: “text”,
“index”: true
},
“unifiedUrls”: {
“type”: “keyword”
},
“urlEntities”: {
“properties”: {
“displayURL”: {
“type”: “keyword”,
“index”: false,
“doc_values”: true
},
“end”: {
“type”: “long”,
“doc_values”: true
},
“expandedURL”: {
“type”: “keyword”,
“index”: false,
“doc_values”: true
},
“start”: {
“type”: “long”,
“doc_values”: true
},
“url”: {
“type”: “keyword”,
“index”: false,
“doc_values”: true
}
}
},
“user”: {
“properties”: {
“createdAt”: {
“type”: “date”,
“doc_values”: true,
“format”: “dateOptionalTime”
},
“description”: {
“type”: “keyword”,
“index”: true
},
“favouritesCount”: {
“type”: “long”,
“doc_values”: true
},
“followersCount”: {
“type”: “long”,
“doc_values”: true
},
“friendsCount”: {
“type”: “long”,
“doc_values”: true
},
“id”: {
“type”: “keyword”,
“index”: true,
“doc_values”: true
},
“lang”: {
“type”: “keyword”,
“index”: true,
“doc_values”: true
},
“location”: {
“type”: “keyword”,
“index”: true,
“doc_values”: true
},
“name”: {
“type”: “keyword”,
“index”: true,
“fields”: {
“raw”: {
“type”: “keyword”,
“index”: false,
“doc_values”: true
}
}
},
“profileImageUrl”: {
“type”: “keyword”,
“index”: false,
“doc_values”: true
},
“screenName”: {
“type”: “keyword”,
“index”: true,
“doc_values”: true
},
“statusesCount”: {
“type”: “long”,
“doc_values”: true
},
“url”: {
“type”: “keyword”,
“index”: false,
“doc_values”: true
},
“urlEntity”: {
“properties”: {
“displayURL”: {
“type”: “keyword”,
“index”: false,
“doc_values”: true
},
“end”: {
“type”: “long”,
“doc_values”: true
},
“expandedURL”: {
“type”: “keyword”,
“index”: false,
“doc_values”: true
},
“start”: {
“type”: “long”,
“doc_values”: true
},
“url”: {
“type”: “keyword”,
“index”: false,
“doc_values”: true
}
}
}
}
},
“userMentionEntities”: {
“properties”: {
“end”: {
“type”: “long”
},
“id”: {
“type”: “long”
},
“name”: {
“type”: “keyword”
},
“screenName”: {
“type”: “keyword”,
“index”: true,
“doc_values”: true
},
“start”: {
“type”: “long”
}
}
}
}
}
}
}
sudo apt install python-pytest python-elasticsearch
Inside the script, I switched the ES_CONN variable to the IP address 172.18.0.2
python test_smoke.py
Check
GET demo / _search
{
“query”: {
match_all: {}
}
}
- MySQL
There is a file in the tmp / pyalerts directory, you need to comment out the creation of indexes (if this file is used for the first time)
mysql-tested.sql
Run the command, write to the password request: test
mysql -h 172.18.0.4 -u test -p insikt <mysql-tested.sql
Then go to the MySQL console and create 2 more tables:
mysql -h 172.18.0.4 -u test -p insikt
CREATE TABLE language (id INT auto_increment PRIMARY KEY, name text, status tinyint (1));
CREATE TABLE network_analysis (id INT auto_increment PRIMARY KEY, project_id VARCHAR (256) DEFAULT NULL, start date, end date, source varchar (1000), status tinyint (4));
- We look at the result
sudo docker logs deploy_backend_1
sudo docker logs deploy_frontend_1
sudo docker ps
Container list
ps axf
you can see the list of converters in the process
docker restart – restart the container in memory
docker update – with updating parameters
sudo docker exec -it {container_name} bash
go inside the container
DB_PASS_POSTGRESQL = “Pbdivbknn123”
Setting up postgress on locahost
How to allow remote connections to PostgreSQL database server
register listen_addresses = ‘*’ in /etc/postgres/10/main/postgres.conf
As well as valid IP addresses from which the inputs to the BD are in the /etc/postgres/10/main/pg_hbd.conf file
In the docker-composet.yml file for accessing Kafke containers in broker service environment variables:
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT: // broker: 29092, PLAINTEXT_HOST: // broker: 9092
for external access to kafka without a container:
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT: // broker: 29092, PLAINTEXT_HOST: // localhost: 9092
System restart
cd deploy
sudo docker-compose down – cut down all containers
sudo docker stop {container name} – stop
sudo docker rm {container name} – delete
sudo docker-compose up -d – run all containers from the docker-composet.yml file
- Storm
https://streamparse.readthedocs.io/en/stable/quickstart.html
The script for the automatic installation of Storm from the local Linux system via ssh is located in the ~ / tmp / supervisord / storm / storm.py directory
You also need an archive with the Storm distribution version 1.0.6 – c.tar.gz
and the demon archive for running Storm called supervisord – s.tar.gz.
In addition, there is also Oracle Java – jdk-8u192-linux-x64.tar.gz
Both archives are saved in ~ / tmp / supervisord /
To install Java, use the command:
apt install default-jdk
Then for the bundle we put Oracle Java in the / usr / lib / jvm directory next to OpenJdk, sometimes the original Java is more suitable.
To use Oracle Java, it is enough to set the environment variable in the necessary configs:
JAVA_HOME = / usr / lib / jvm / jdk1.8.0_192
Add path PATH = ”/ opt / storm / current / bin: $ PATH” to ~ / .profile
To start storm type:
sudo /etc/init.d/supervisor start
To stop Stopm, the same command as above, with the word stop.
storm version
http://ip:8080/index.html
- Leiningen
https://github.com/technomancy/leiningen#leiningen
You do not have to create an additional bin directory in the ~ home directory.
cd ~
mkdir bin
cd ~ / bin
or
cd ~ / .local / bin
wget https://raw.githubusercontent.com/technomancy/leiningen/stable/bin/lein
chmod + x ~ / bin / lein
or
chmod + x ~ / .local / bin
Further
lein version
- Streamparse
sudo pip3 install streamparse
sparse quickstart wordcount
cd wordcount
In the project.clj file, you need to change the version of Storm in line 6:
: dependencies [[org.apache.storm / storm-core “1.0.6”]
Check the performance:
sparse run
To run the task on the Storm cluster, and not on the local library, use the command (IPs configuration is required in the config.json file):
sparse submit
- TensorFlow
For the first acquaintance with the capabilities of neural networks, this link is suitable:
https://www.tensorflow.org/tutorials/keras/classification?hl=en
Copy the NLP_Engine_v2.tar.gz archive to any place on the disk where there is free space.
Unzip to the same directory. This is important as the model is voluminous:
tar -zxvf NLP_Engine_v2.tar.gz -C.
The extreme point in the command means to unzip to the current directory.
Then you need to install the necessary Python packages:
sudo pip3 install nltk numpy regex stanfordnlp joblib lmdb vaderSentiment polyglot pycld2 morfessor keras tensorflow sklearn elasticsearch pandas
sudo apt install python3-mysql.connector python3-pycurl
Then go to the NLP_Engine_v2 directory and execute the command:
source env / bin / activate
Pay attention to the nlp directory. This is the python module that analyzes.
After that, you need to write the paths for the local python module nlp:
nano nlp_engine / src / bolts / tweet_analysis.py
In another file, you need to correct the environment variables, in principle, if you run through the container, you need to do this in the docker-compose.yml file.
nano nlp_engine/src/spouts/tweets.py
Then go to the nlp_engine directory and run the command:
sparse run
To run on the Storm cluster, you need to adjust the IP addresses in the config.json file and use the command:
sparse submit
- A container to run NLP
To run machine analysis, go to the directory … and run srun.sh
- Clause. Additions and thoughts out loud.
This command can be used to deflate a site. The resulting directories can be used to test containers for parsing to isolate information.
To compile a map or site index, in principle, you can use the standard features of Storm.
If you successfully filter this content, then it can be used to train the neural network.
wget -m -l 10 -e robots = off -p -k -E –reject-regex “wp” –no-check-certificate -U = “Mozilla / 5.0 (Windows NT 10.0; Win64; x64) AppleWebKit / 537.36 (KHTML, like Gecko) Chrome / 68.0.3440.106 Safari / 537.36 “forum.katera.ru
A trained neural network can be used to generate messages to maintain dialogue. For any questions, immediately give a link to the source on this forum.
Composer (ru)
- Ставим правильный докер по документации https://docs.docker.com/machine/install-machine/
- Ставим docker-compose sudo apt install docker-compose
- Ставим virtualbox https://tecadmin.net/install-virtualbox-on-ubuntu-18-04/
- Ставим https://docs.confluent.io/current/quickstart/ce-docker-quickstart.html
- Ставим PostgreSQL https://www.digitalocean.com/community/tutorials/how-to-install-and-use-postgresql-on-ubuntu-18-04-ru
В PostgreSQL создаём базу данных insikt и востанавливаем таблицы из дампа.
Keystore db
Also is needed restore the keystore db.
We need in our local restore it from a dump with
pg_restore -h 75.126.254.59 -U postgres -W insikt1.sql -d insikt
Note: Database must be already created
- Собираем имиджи контейнеров, запуская в каждом srun.sh. В каталоге backend запускаем make deploy
- После включения инстанса:
systemctl enable systemd-resolved
systemctl start systemd-resolved
- Проверка Еластика
https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html
sudo docker run -p 9200:9200 -p 9300:9300 -e “discovery.type=single-node” docker.elastic.co/elasticsearch/elasticsearch:5.6.15
- Добавление индекса в Elasticsearch
After this we need to go to Kibana though browser in IP_SERVER:5601
Once Kibana is loaded go to the sidebar menu and click on Dev Tools.
You will see kibana console
Открываем kibana (/app/kibana#/dev_tools/console?_g=())
Dev Tools – Console
Вставляем код скрипта
PUT demo
{
“settings”: {
“number_of_shards”: 6,
“number_of_replicas”: 1,
“analysis”: {
“analyzer”: {
“default”: {
“type”: “standard”,
“tokenizer”: “lowercase”,
“filter”: [
“asciifolding”
]
}
}
},
“index.requests.cache.enable”: true
},
“mappings”: {
“tweet”: {
“_source”: {
“enabled”: true
},
“properties”: {
“analysis”: {
“properties”: {
“threatScore”: {
“type”: “long”,
“doc_values”: true
},
“concepts”: {
“properties”: {
“concept”: {
“type”: “keyword”,
“index”: true,
“doc_values”: true
}
}
},
“docSentiment”: {
“type”: “double”,
“index”: true,
“doc_values”: true
},
“emotions”: {
“properties”: {
“emotion”: {
“type”: “keyword”,
“index”: true,
“doc_values”: true
}
}
},
“entities”: {
“type”: “nested”,
“properties”: {
“entity”: {
“type”: “keyword”,
“index”: true,
“doc_values”: true
},
“entityType”: {
“type”: “keyword”,
“index”: true,
“doc_values”: true
},
“type”: {
“type”: “keyword”,
“index”: true,
“doc_values”: true
}
}
},
“hashtags”: {
“properties”: {
“text”: {
“type”: “keyword”,
“index”: true,
“doc_values”: true
}
}
},
“keyIdeas”: {
“properties”: {
“keyIdea”: {
“type”: “keyword”,
“index”: true,
“doc_values”: true
}
}
},
“screenName”: {
“type”: “keyword”,
“index”: true,
“doc_values”: true
},
“topics”: {
“type”: “nested”,
“properties”: {
“topic”: {
“type”: “keyword”,
“index”: true,
“doc_values”: true
},
“category”: {
“type”: “keyword”,
“index”: true,
“doc_values”: true
}
}
}
}
},
“createdAt”: {
“type”: “date”,
“index”: true,
“doc_values”: true,
“format”: “dateOptionalTime”
},
“detectedLang”: {
“type”: “keyword”,
“index”: true,
“doc_values”: true
},
“geoLocation”: {
“properties”: {
“latitude”: {
“type”: “double”,
“index”: true,
“doc_values”: true
},
“longitude”: {
“type”: “double”,
“index”: true,
“doc_values”: true
}
}
},
“coordinates”: {
“index”: true,
“type”: “geo_point”
},
“geoname”: {
“properties”: {
“countryCode”: {
“type”: “keyword”,
“index”: true,
“doc_values”: true
},
“geonameid”: {
“type”: “integer”,
“index”: true,
“doc_values”: true
},
“name”: {
“type”: “keyword”,
“index”: true,
“doc_values”: true
}
}
},
“hashtagEntities”: {
“properties”: {
“end”: {
“type”: “long”,
“doc_values”: true
},
“start”: {
“type”: “long”,
“doc_values”: true
},
“text”: {
“type”: “keyword”,
“index”: false,
“doc_values”: true
}
}
},
“id”: {
“type”: “keyword”,
“index”: true,
“doc_values”: true
},
“idLong”: {
“type”: “long”,
“doc_values”: true
},
“mediaEntities”: {
“properties”: {
“end”: {
“type”: “long”,
“doc_values”: true
},
“mediaURL”: {
“type”: “keyword”,
“index”: false,
“doc_values”: true
},
“mediaURLHttps”: {
“type”: “keyword”,
“index”: false,
“doc_values”: true
},
“start”: {
“type”: “long”,
“doc_values”: true
}
}
},
“place”: {
“properties”: {
“boundingBoxCoordinates”: {
“properties”: {
“latitude”: {
“type”: “double”,
“index”: true,
“doc_values”: true
},
“longitude”: {
“type”: “double”,
“index”: true,
“doc_values”: true
}
}
},
“boundingBoxType”: {
“type”: “keyword”,
“index”: true,
“doc_values”: true
},
“country”: {
“type”: “keyword”,
“index”: true,
“doc_values”: true
},
“countryCode”: {
“type”: “keyword”,
“index”: true,
“doc_values”: true
},
“fullName”: {
“type”: “keyword”,
“index”: true,
“doc_values”: true
},
“id”: {
“type”: “keyword”,
“index”: true,
“doc_values”: true
},
“name”: {
“type”: “keyword”,
“index”: true,
“doc_values”: true
},
“placeType”: {
“type”: “keyword”,
“index”: true,
“doc_values”: true
},
“url”: {
“type”: “keyword”,
“index”: false,
“doc_values”: true
}
}
},
“retweetedStatus”: {
“properties”: {
“createdAt”: {
“type”: “date”,
“index”: true,
“doc_values”: true,
“format”: “dateOptionalTime”
},
“geoLocation”: {
“properties”: {
“latitude”: {
“type”: “double”,
“doc_values”: true
},
“longitude”: {
“type”: “double”,
“doc_values”: true
}
}
},
“hashtagEntities”: {
“properties”: {
“end”: {
“type”: “long”,
“doc_values”: true
},
“start”: {
“type”: “long”,
“doc_values”: true
},
“text”: {
“type”: “keyword”,
“index”: true,
“doc_values”: true
}
}
},
“id”: {
“type”: “keyword”,
“index”: true,
“doc_values”: true
},
“mediaEntities”: {
“properties”: {
“end”: {
“type”: “long”,
“doc_values”: true
},
“mediaURL”: {
“type”: “keyword”,
“index”: false,
“doc_values”: true
},
“mediaURLHttps”: {
“type”: “keyword”,
“index”: false,
“doc_values”: true
},
“start”: {
“type”: “long”,
“doc_values”: true
}
}
},
“place”: {
“properties”: {
“boundingBoxCoordinates”: {
“properties”: {
“latitude”: {
“type”: “double”,
“doc_values”: true
},
“longitude”: {
“type”: “double”,
“doc_values”: true
}
}
},
“boundingBoxType”: {
“type”: “keyword”,
“index”: false,
“doc_values”: true
},
“country”: {
“type”: “keyword”,
“index”: true,
“doc_values”: true
},
“countryCode”: {
“type”: “keyword”,
“index”: true,
“doc_values”: true
},
“fullName”: {
“type”: “keyword”,
“index”: true,
“doc_values”: true
},
“id”: {
“type”: “keyword”,
“index”: false,
“doc_values”: true
},
“name”: {
“type”: “keyword”,
“index”: true,
“doc_values”: true
},
“placeType”: {
“type”: “keyword”,
“index”: true,
“doc_values”: true
},
“url”: {
“type”: “keyword”,
“index”: false,
“doc_values”: true
}
}
},
“source”: {
“type”: “keyword”,
“index”: true
},
“symbolEntities”: {
“properties”: {
“end”: {
“type”: “long”,
“doc_values”: true
},
“start”: {
“type”: “long”,
“doc_values”: true
},
“text”: {
“type”: “keyword”,
“index”: true
}
}
},
“text”: {
“type”: “keyword”,
“index”: true
},
“urlEntities”: {
“properties”: {
“displayURL”: {
“type”: “keyword”,
“index”: false,
“doc_values”: true
},
“end”: {
“type”: “long”,
“doc_values”: true
},
“expandedURL”: {
“type”: “keyword”,
“index”: false,
“doc_values”: true
},
“start”: {
“type”: “long”,
“doc_values”: true
},
“url”: {
“type”: “keyword”,
“index”: false,
“doc_values”: true
}
}
},
“user”: {
“properties”: {
“createdAt”: {
“type”: “date”,
“doc_values”: true,
“format”: “dateOptionalTime”
},
“description”: {
“type”: “keyword”,
“index”: true
},
“favouritesCount”: {
“type”: “long”,
“index”: true,
“doc_values”: true
},
“followersCount”: {
“type”: “long”,
“index”: true,
“doc_values”: true
},
“friendsCount”: {
“type”: “long”,
“index”: true,
“doc_values”: true
},
“id”: {
“type”: “keyword”,
“index”: true,
“doc_values”: true
},
“lang”: {
“type”: “keyword”,
“index”: true,
“doc_values”: true
},
“location”: {
“type”: “keyword”,
“index”: true,
“doc_values”: true
},
“name”: {
“type”: “keyword”,
“index”: true,
“fields”: {
“raw”: {
“type”: “keyword”,
“index”: true,
“doc_values”: true
}
}
},
“profileImageUrl”: {
“type”: “keyword”,
“index”: false,
“doc_values”: true
},
“screenName”: {
“type”: “keyword”,
“index”: true,
“doc_values”: true
},
“statusesCount”: {
“type”: “long”,
“index”: true,
“doc_values”: true
},
“url”: {
“type”: “keyword”,
“index”: false,
“doc_values”: true
}
}
},
“userMentionEntities”: {
“properties”: {
“end”: {
“type”: “long”,
“doc_values”: true
},
“id”: {
“type”: “long”,
“doc_values”: true
},
“name”: {
“type”: “keyword”,
“index”: true
},
“screenName”: {
“type”: “keyword”,
“index”: true,
“doc_values”: true
},
“start”: {
“type”: “long”,
“doc_values”: true
}
}
}
}
},
“savedAt”: {
“type”: “date”,
“doc_values”: true,
“format”: “dateOptionalTime”
},
“source”: {
“type”: “keyword”,
“index”: true
},
“symbolEntities”: {
“properties”: {
“end”: {
“type”: “long”,
“doc_values”: true
},
“start”: {
“type”: “long”,
“doc_values”: true
},
“text”: {
“type”: “keyword”
}
}
},
“text”: {
“type”: “keyword”,
“index”: true
},
“unifiedText”: {
“type”: “text”,
“index”: true
},
“unifiedUrls”: {
“type”: “keyword”
},
“urlEntities”: {
“properties”: {
“displayURL”: {
“type”: “keyword”,
“index”: false,
“doc_values”: true
},
“end”: {
“type”: “long”,
“doc_values”: true
},
“expandedURL”: {
“type”: “keyword”,
“index”: false,
“doc_values”: true
},
“start”: {
“type”: “long”,
“doc_values”: true
},
“url”: {
“type”: “keyword”,
“index”: false,
“doc_values”: true
}
}
},
“user”: {
“properties”: {
“createdAt”: {
“type”: “date”,
“doc_values”: true,
“format”: “dateOptionalTime”
},
“description”: {
“type”: “keyword”,
“index”: true
},
“favouritesCount”: {
“type”: “long”,
“doc_values”: true
},
“followersCount”: {
“type”: “long”,
“doc_values”: true
},
“friendsCount”: {
“type”: “long”,
“doc_values”: true
},
“id”: {
“type”: “keyword”,
“index”: true,
“doc_values”: true
},
“lang”: {
“type”: “keyword”,
“index”: true,
“doc_values”: true
},
“location”: {
“type”: “keyword”,
“index”: true,
“doc_values”: true
},
“name”: {
“type”: “keyword”,
“index”: true,
“fields”: {
“raw”: {
“type”: “keyword”,
“index”: false,
“doc_values”: true
}
}
},
“profileImageUrl”: {
“type”: “keyword”,
“index”: false,
“doc_values”: true
},
“screenName”: {
“type”: “keyword”,
“index”: true,
“doc_values”: true
},
“statusesCount”: {
“type”: “long”,
“doc_values”: true
},
“url”: {
“type”: “keyword”,
“index”: false,
“doc_values”: true
},
“urlEntity”: {
“properties”: {
“displayURL”: {
“type”: “keyword”,
“index”: false,
“doc_values”: true
},
“end”: {
“type”: “long”,
“doc_values”: true
},
“expandedURL”: {
“type”: “keyword”,
“index”: false,
“doc_values”: true
},
“start”: {
“type”: “long”,
“doc_values”: true
},
“url”: {
“type”: “keyword”,
“index”: false,
“doc_values”: true
}
}
}
}
},
“userMentionEntities”: {
“properties”: {
“end”: {
“type”: “long”
},
“id”: {
“type”: “long”
},
“name”: {
“type”: “keyword”
},
“screenName”: {
“type”: “keyword”,
“index”: true,
“doc_values”: true
},
“start”: {
“type”: “long”
}
}
}
}
}
}
}
sudo apt install python-pytest python-elasticsearch
Внутри скрипта переключил переменную ES_CONN на IP адресс 172.18.0.2
python test_smoke.py
Проверка
GET demo/_search
{
“query”: {
“match_all”: {}
}
}
- MySQL
В директории tmp/pyalerts есть фай, надо закоментить создание индексов (если этот файл используеться впервые)
mysql-tested.sql
Выполнить команду, на запрос пароля написать : test
mysql -h 172.18.0.4 -u test -p insikt < mysql-tested.sql
Затем зайти в консоль MySQL и создать еще 2 таблицы:
mysql -h 172.18.0.4 -u test -p insikt
CREATE TABLE language (id INT auto_increment PRIMARY KEY, name text, status tinyint(1));
CREATE TABLE network_analysis (id INT auto_increment PRIMARY KEY,project_id VARCHAR(256) DEFAULT NULL, start date, end date, source varchar(1000), status tinyint(4));
- Смотрим результат
sudo docker logs deploy_backend_1
sudo docker logs deploy_frontend_1
sudo docker ps
Список контейнеров
ps axf
можно посмотреть список контерйнеров в процессе
docker restart – рестарт контейнера в памяти
docker update – с обновлением параметров
sudo docker exec -it {container_name} bash
заходим внутрь контейнера
DB_PASS_POSTGRESQL = “Pbdivbknn123”
Настройка postgress на locahost
https://bosnadev.com/2015/12/15/allow-remote-connections-postgresql-database-server/
прописать listen_addresses = ‘*’ в /etc/postgres/10/main/postgres.conf
А так же допустимые IP адреса с которых входм в BD в файле /etc/postgres/10/main/pg_hbd.conf
В файле docker-composet.yml для доступа к Kafke контейнеров в переменных окружения сервиса broker:
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://broker:29092,PLAINTEXT_HOST://broker:9092
для внешнего доступа к кафке без контейнера :
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://broker:29092,PLAINTEXT_HOST://localhost:9092
Перезапуск системы
cd deploy
sudo docker-compose down – вырубаем все контейнеры
sudo docker stop {имя контейнера} – остановить
sudo docker rm {имя контейнера} – удалить
sudo docker-compose up -d – запускаем все контейнеры из docker-composet.yml файла
- Storm
https://streamparse.readthedocs.io/en/stable/quickstart.html
Скрипт для автоматической установки Storm с локальной Линукс системы через ssh находиться в каталоге ~/tmp/supervisord/storm/storm.py
Так же вам нужен архив с дистрибутивом Storm версии 1.0.6 – c.tar.gz
и архив демона для запуска Storm, который называеться supervisord – s.tar.gz.
Кроме того, там же находиться Oracle Java – jdk-8u192-linux-x64.tar.gz
Оба архива сохранены в ~/tmp/supervisord/
Для установки Java используем команду:
apt install default-jdk
Затем для комплекта положим Oracle Java в каталог /usr/lib/jvm рядом c OpenJdk, иногда оригинальный Java больше подходит.
Для использования Oracle Java достаточно задать переменную среды окружения в нужных конфигах:
JAVA_HOME=/usr/lib/jvm/jdk1.8.0_192
Добавить путь PATH=”/opt/storm/current/bin:$PATH” в файл ~/.profile
Для старта storm набрать:
sudo /etc/init.d/supervisor start
Для остановки Stopm такаяже команда как выше, со словом stop.
storm version
http://75.126.254.59:8080/index.html
- Leiningen
https://github.com/technomancy/leiningen#leiningen
Можно не создавать дополнительный каталог bin в домашнем каталоге ~.
cd ~
mkdir bin
cd ~/bin
или
cd ~/.local/bin
wget https://raw.githubusercontent.com/technomancy/leiningen/stable/bin/lein
chmod +x ~/bin/lein
или
chmod +x ~/.local/bin
далее
lein version
- Streamparse
sudo pip3 install streamparse
sparse quickstart wordcount
cd wordcount
В файле project.clj надо изменить версию Storm в 6 строчке:
:dependencies [[org.apache.storm/storm-core “1.0.6”]
Проверим работоспособность:
sparse run
Для запуска задачи на кластере Storm, а не на локальной библиотеке используют команду (требуеться конфигурация IPs в файле config.json ):
sparse submit
- TensorFlow
Для первого знакомство с возможностями нейросетей подойдет вот эта ссылка:
https://www.tensorflow.org/tutorials/keras/classification?hl=ru
Скопировать архив NLP_Engine_v2.tar.gz в любое место на диске, где есть свободное место.
Разархивировать в той же директории. Это важно так как модель объёмная:
tar -zxvf NLP_Engine_v2.tar.gz -C .
Крайняя точка в команде обозначает разорхивировать в текущую директорию.
Затем надо установить необходимы Питон пакеты:
sudo pip3 install nltk numpy regex stanfordnlp joblib lmdb vaderSentiment polyglot pycld2 morfessor keras tensorflow sklearn elasticsearch pandas
sudo apt install python3-mysql.connector python3-pycurl
Затем зайти в каталог NLP_Engine_v2 и выполнит команду:
source env/bin/activate
Обратите внимание на каталог nlp. Это модуль питона который занимаеться анализом.
После этого надо прописать пути для местного питон модуля nlp:
nano nlp_engine/src/bolts/tweet_analysis.py
В другом файле надо подправить переменные среды окружения, в принципе, если запускать через контейнер, это нужно делать в docker-compose.yml файл.
nano nlp_engine/src/spouts/tweets.py
Затем перейдите в каталог nlp_engine и выполните команду:
sparse run
Для запуска на кластере Storm, надо корректировать IP адреса в файле config.json и использовать команду:
sparse submit
- Контейнер для запуска NLP
Для запуска машинного анализа заходим в каталог … и запускаем srun.sh
docker-compose.yml
storm:
image: storm:latestvolumes:- /home/ubuntu/deploy/storm/suite:/suiteenvironment:- POSTGRES_HOST=postgresql- POSTGRES_PORT=5432
– POSTGRES_DBNAME=inviso
– POSTGRES_USER=postgres
– POSTGRES_PASS=demo
restart: always
networks:
default:
ipv4_address: 172.18.0.25 |
Dockerfile
FROM ubuntu
RUN mkdir -p /home/ubuntu
WORKDIR /home/ubuntu
RUN apt-get -y -q update && apt-get install -y wget sudo git libicu-dev python htop curl python3 python3-pip python3-pycurl
#INSTALL JAVA 8
COPY jdk1.8.0_192 jdk1.8.0_192
RUN sudo mkdir -p /usr/lib/jvm
RUN sudo ln -s /home/ubuntu/jdk1.8.0_192 /usr/lib/jvm/java-8-oracle
#STORM
COPY apache-storm-1.0.6 apache-storm-1.0.6
COPY nlp nlp
COPY nlp_engine nlp_engine
RUN sudo echo ‘JAVA_HOME=”/usr/lib/jvm/java-8-oracle”‘ >> /etc/environment
RUN echo ‘PATH=”/home/ubuntu/apache-storm-1.0.6/bin:$PATH”‘ >> /etc/environment
RUN sudo pip3 install -U git+https://github.com/aboSamoor/polyglot.git@master
RUN sudo pip3 install mysql-connector kafka-python streamparse flask nltk numpy regex stanfordnlp joblib lmdb vaderSentiment pycld2 morfessor keras tensorflow sklearn elasticsearch pandas
WORKDIR /home/ubuntu
RUN wget https://raw.githubusercontent.com/technomancy/leiningen/stable/bin/lein
RUN sudo chmod +x lein
RUN sudo cp /home/ubuntu/lein /usr/local/bin
ENV JAVA_HOME /usr/lib/jvm/java-8-oracle
ENV JAVACMD /usr/lib/jvm/java-8-oracle/bin/java
ENV PATH “/home/ubuntu/apache-storm-1.0.6/bin:/usr/lib/jvm/java-8-oracle/bin/:$PATH”
ENV LEIN_ROOT true
COPY start.sh start.sh
CMD bash start.sh |
start.sh
#!/bin/bash
storm version
lein version
sparse quickstart wordcount
sleep 50000 |
- Пункт. Дополнения и мысли вслух.
Эту команду можно применить для выкачивания сайта. Полученные директории можно использовать для тестирования контейнеров по парсингу для вычленения информации.
Для составления карты или индекса сайта, в принципе можно использовать стандартные возможности Storm.
Если удачно отфильтровать этот контент, то его можно использовать для обучения нейросетки.
wget -m -l 10 -e robots=off -p -k -E –reject-regex “wp” –no-check-certificate -U=”Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36″ forum.katera.ru
Обученную нейросетку, можно использовать для генерации сообщений для поддержания диалога. При любых вопросах сразу давать ссылку на первоисточних на этом форуме.
Swarm
Categories (3) Deploy Inviso Created 12 days ago, last updated 8 days ago
Clone repository git clone https://github.com/InsiktIntelligence/insikt-deploy-entrega.git
Create Swarm Now we need to create the swarm nodes (master and slaves) and join them as a cluster Go to insikt-deploy-entrega directory and execute swarm.sh cd insikt-deploy-entrega/ bash swarm.sh
Note: Is recommended that all swarms must have 8GB of memory RAM. Actually in the file some nodes on the swarm have just 4GB because there’s not enough memory on the server which is deployed. Check result:
docker-machine ls
Expected output: Image
Swarm deployment We need to execute swarm.sh file. For this step we need: Configure aws CLI Match swarm certificates directory Check out in which directory is getting configuration source
Configure aws CLI Just execute the next command and follow the steps aws configure
Match swarm certificates directory
In the deploy.sh file, is needed that next constant can search swarm-1 cert
export DOCKER_CERT_PATH=”ABSOLUTE_PATH_TO_CERT”
Note: Normally, cert file is located on ~/.docker/machine/machines/swarm-1
Check out in which directory is getting configuration source
Currently, in the file which creates the services, the paths configuration source are harcoded.
This file is in insikt-deploy-entrega/insikt-swarm/ci/docker-staging.yml
A good example would be: configs:
frontendconfig:
file: /home/ubuntu/tmp/insikt-deploy-entrega/insikt-swarm/config/nginx/nginx.conf kibanaconfig: file: /home/ubuntu/tmp/insikt-deploy-entrega/insikt-swarm/config/kibana/kibana.yml elasticconfig: file: /home/ubuntu/tmp/insikt-deploy-entrega/insikt-swarm/config/elasticsearch/elasticsearch.yml logstashconfig: file: /home/ubuntu/tmp/insikt-deploy-entrega/insikt-swarm/config/logstash/logstash.conf
In this example, the /home/ubuntu/tmp path should be replaced for the good one.
Once all thiese prerequisites are done, we can deploy the services.
In the terminal execute:
eval $(docker-machine env swarm-1)
bash swarm.sh
If all went well, after execute it, the ending of the output in the terminal should be some thing like:
staging_elasticsearch
overall progress: 1 out of 1 tasks
1/1: running [==================================================>]
verify: Service converged
staging_database
overall progress: 1 out of 1 tasks
1/1: running [==================================================>]
verify: Service converged
Expose Swarm nodes
Using iptables
To expose the node to the outside, we need to forward the requests. This should be possible using iptables.
The file port_forward.sh in insikit-deploy-entrea/ adds the rules to iptables.
Note: Before use it, check that the variable wan_addr is your server WAN IP
How it works:
bash port_forward.sh IP_CONTAINER PORT_CONTAINER
Ex: bash port_forward.sh 192.168.99.100 9090
This command will forward the request from your_wan_ip_server:9090 to 192.168.99.100:9090 If does not work If does not work you need to check many things: iptables port forward option is enabled
Check if the option is allowed with sysctl net.ipv4.ip_forward Execute: sysctl net.ipv4.ip_forward Expected output: net.ipv4.ip_forward = 1
If is 0 you must need to execute: sysctl -w net.ipv4.ip_forward=1 printf “net.ipv4.ip_forward = 1” >> /etc/sysctl.conf
There is another firewall In Ubuntu servers must be probable that ufw is enabled. Check it with: Execute: ufw status Expected output: Status: inactive
If is active, execute ufw disable and restart the server.
Python alerts db We go to create the database for alerts in python. First we need to copy the repository git clone https://github.com/InsiktIntelligence/insikt-backend-pyalerts.git
To continue, we need to follow the next steps: Configuration values are retrieved from this config file. Application reads that config file from the path stored within environment variable INSIKT_APP_SETTINGS. If that variable is not set then relative path ../config/default.app.cfg is used. In order to provide your own values for config properties listed within default ./config/default.app.cfg file, make a copy of that file and set path to this copy as value for INSIKT_APP_SETTINGS environment variable. For example, you can create local copy of default config:
cd insikt-backend-pyalerts cp ./config/default.app.cfg ./config/be-local-dev-env.app.cfg export INSIKT_APP_SETTINGS=../config/be-local-dev-env.app.cfg
mysql -h 192.168.99.100 -u test -p insikt < mysql-tested.sql
This will instruct the Flask to read configs from your local copy of default config file. Note, your local copy will not be recognized by git, as there is a gitignore configuration to omit all copies of default config gile, except the default config itself. So, if you need to change some default configuration parameters you will need to edit ./config/default.app.cfg file and commit this changes. You also need to create a table language and network_analysis. CREATE TABLE language (id INT auto_increment PRIMARY KEY, name text, status tinyint(1)); CREATE TABLE network_analysis (id INT auto_increment PRIMARY KEY,project_id VARCHAR(256) DEFAULT NULL, start date, end date, source varchar(1000), status tinyint(4));
Keystore db Also is needed restore the keystore db. We need in our local restore it from a dump with
pg_restore -h 95.216.97.242 -U postgres -W insikt1.sql -d insikt
Note: Database must be already created
Create Elasticsearch demo index We need to expose Kibana with port_foward.sh file bash port_foward.sh 192.168.99.100 5601
After this we need to go to Kibana though browser in IP_SERVER:5601 Once Kibana is loaded go to the sidebar menu and click on Dev Tools. You will see kibana console Image
You need to copy this code to create the demo index.
PUT demo {
“settings”: { “number_of_shards”: 6, “number_of_replicas”: 1, “analysis”: { “analyzer”: { “default”: { “type”: “standard”, “tokenizer”: “lowercase”, “filter”: [ “asciifolding” ] } } }, “index.requests.cache.enable”: true }, “mappings”: { “tweet”: { “_source”: { “enabled”: true },
“properties”: { “analysis”: { “properties”: { “threatScore”: { “type”: “long”, “doc_values”: true }, “concepts”: { “properties”: { “concept”: { “type”: “keyword”, “index”: true, “doc_values”: true } } }, “docSentiment”: { “type”: “double”, “index”: true, “doc_values”: true },
“emotions”: { “properties”: { “emotion”: { “type”: “keyword”, “index”: true, “doc_values”: true } } }, “entities”: { “type”: “nested”, “properties”: { “entity”: { “type”: “keyword”, “index”: true, “doc_values”: true }, “entityType”: { “type”: “keyword”, “index”: true, “doc_values”: true }, “type”: { “type”: “keyword”, “index”: true, “doc_values”: true }
} }, “hashtags”: { “properties”: { “text”: { “type”: “keyword”, “index”: true, “doc_values”: true } } }, “keyIdeas”: { “properties”: { “keyIdea”: { “type”: “keyword”, “index”: true, “doc_values”: true } } },
“screenName”: { “type”: “keyword”, “index”: true, “doc_values”: true }, “topics”: { “type”: “nested”, “properties”: { “topic”: { “type”: “keyword”, “index”: true, “doc_values”: true }, “category”: { “type”: “keyword”, “index”: true, “doc_values”: true } } } } },
“createdAt”: { “type”: “date”, “index”: true, “doc_values”: true, “format”: “dateOptionalTime” }, “detectedLang”: { “type”: “keyword”, “index”: true, “doc_values”: true }, “geoLocation”: { “properties”: { “latitude”: { “type”: “double”, “index”: true, “doc_values”: true }, “longitude”: { “type”: “double”, “index”: true, “doc_values”: true } } }, “coordinates”: {
“index”: true, “type”: “geo_point” }, “geoname”: { “properties”: { “countryCode”: { “type”: “keyword”, “index”: true, “doc_values”: true }, “geonameid”: {
“type”: “integer”, “index”: true, “doc_values”: true }, “name”: {
“type”: “keyword”, “index”: true, “doc_values”: true } } },
“hashtagEntities”: { “properties”: { “end”: { “type”: “long”, “doc_values”: true }, “start”: { “type”: “long”, “doc_values”: true }, “text”: { “type”: “keyword”, “index”: false, “doc_values”: true } } },
“id”: { “type”: “keyword”, “index”: true, “doc_values”: true }, “idLong”: { “type”: “long”, “doc_values”: true }, “mediaEntities”: { “properties”: { “end”: { “type”: “long”, “doc_values”: true }, “mediaURL”: { “type”: “keyword”, “index”: false, “doc_values”: true }, “mediaURLHttps”: { “type”: “keyword”, “index”: false, “doc_values”: true },
“start”: { “type”: “long”, “doc_values”: true } } }, “place”: { “properties”: { “boundingBoxCoordinates”: { “properties”: { “latitude”: { “type”: “double”, “index”: true, “doc_values”: true }, “longitude”: { “type”: “double”, “index”: true, “doc_values”: true } } },
“boundingBoxType”: { “type”: “keyword”, “index”: true, “doc_values”: true }, “country”: { “type”: “keyword”, “index”: true, “doc_values”: true }, “countryCode”: { “type”: “keyword”, “index”: true, “doc_values”: true }, “fullName”: { “type”: “keyword”, “index”: true, “doc_values”: true },
“id”: { “type”: “keyword”, “index”: true, “doc_values”: true }, “name”: { “type”: “keyword”, “index”: true, “doc_values”: true }, “placeType”: { “type”: “keyword”, “index”: true, “doc_values”: true }, “url”: { “type”: “keyword”, “index”: false, “doc_values”: true } } },
“retweetedStatus”: { “properties”: { “createdAt”: { “type”: “date”,
“index”: true, “doc_values”: true, “format”: “dateOptionalTime”
}, “geoLocation”: { “properties”: { “latitude”: {
“type”: “double”, “doc_values”: true },
“longitude”: { “type”: “double”, “doc_values”: true } } },
“hashtagEntities”: { “properties”: { “end”: { “type”: “long”, “doc_values”: true }, “start”: { “type”: “long”, “doc_values”: true }, “text”: { “type”: “keyword”, “index”: true, “doc_values”: true } } }, “id”: { “type”: “keyword”, “index”: true, “doc_values”: true },
“mediaEntities”: { “properties”: { “end”: { “type”: “long”, “doc_values”: true }, “mediaURL”: { “type”: “keyword”, “index”: false, “doc_values”: true }, “mediaURLHttps”: { “type”: “keyword”, “index”: false, “doc_values”: true }, “start”: { “type”: “long”, “doc_values”: true } } },
“place”: { “properties”: { “boundingBoxCoordinates”: { “properties”: { “latitude”: { “type”: “double”, “doc_values”: true }, “longitude”: { “type”: “double”, “doc_values”: true } } }, “boundingBoxType”: { “type”: “keyword”, “index”: false, “doc_values”: true }, “country”: { “type”: “keyword”, “index”: true, “doc_values”: true },
“countryCode”: { “type”:”keyword”, “index”: true, “doc_values”: true }, “fullName”: { “type”: “keyword”, “index”: true, “doc_values”: true }, “id”: { “type”: “keyword”, “index”: false, “doc_values”: true }, “name”: { “type”: “keyword”, “index”: true, “doc_values”: true },
“placeType”: { “type”: “keyword”, “index”: true, “doc_values”: true }, “url”: { “type”: “keyword”, “index”: false, “doc_values”: true } } }, “source”: { “type”: “keyword”, “index”: true }, “symbolEntities”: { “properties”: { “end”: { “type”: “long”, “doc_values”: true },
“start”: { “type”: “long”, “doc_values”: true }, “text”: { “type”: “keyword”, “index”: true } } }, “text”: { “type”: “keyword”, “index”: true }, “urlEntities”: { “properties”: { “displayURL”: { “type”: “keyword”, “index”: false, “doc_values”: true }, “end”: { “type”: “long”, “doc_values”: true },
“expandedURL”: { “type”: “keyword”, “index”: false, “doc_values”: true }, “start”: { “type”: “long”, “doc_values”: true }, “url”: { “type”: “keyword”, “index”: false, “doc_values”: true } } },
“user”: { “properties”: { “createdAt”: { “type”: “date”, “doc_values”: true, “format”: “dateOptionalTime” }, “description”: { “type”: “keyword”, “index”: true }, “favouritesCount”: { “type”: “long”, “index”: true, “doc_values”: true }, “followersCount”: { “type”: “long”, “index”: true, “doc_values”: true },
“friendsCount”: { “type”: “long”, “index”: true, “doc_values”: true }, “id”: {
“type”: “keyword”,
“index”: true,
“doc_values”: true
},
“lang”: { “type”: “keyword”, “index”: true, “doc_values”: true }, “location”: { “type”: “keyword”, “index”: true, “doc_values”: true },
“name”: { “type”: “keyword”, “index”: true, “fields”: { “raw”: { “type”: “keyword”, “index”: true, “doc_values”: true } } }, “profileImageUrl”: { “type”: “keyword”, “index”: false, “doc_values”: true }, “screenName”: { “type”: “keyword”, “index”: true, “doc_values”: true },
“statusesCount”: { “type”: “long”, “index”: true, “doc_values”: true }, “url”: { “type”: “keyword”, “index”: false, “doc_values”: true } } }, “userMentionEntities”: { “properties”: { “end”: { “type”: “long”, “doc_values”: true }, “id”: { “type”: “long”, “doc_values”: true },
“name”: { “type”: “keyword”, “index”: true }, “screenName”: { “type”: “keyword”, “index”: true, “doc_values”: true }, “start”: { “type”: “long”, “doc_values”: true } } } } }, “savedAt”: { “type”: “date”, “doc_values”: true, “format”: “dateOptionalTime” },
“source”: { “type”: “keyword”, “index”: true }, “symbolEntities”: { “properties”: { “end”: { “type”: “long”, “doc_values”: true }, “start”: { “type”: “long”, “doc_values”: true }, “text”: { “type”: “keyword” } } }, “text”: { “type”: “keyword”, “index”: true },
“unifiedText”: { “type”: “text”, “index”: true }, “unifiedUrls”: { “type”: “keyword” }, “urlEntities”: { “properties”: { “displayURL”: { “type”: “keyword”, “index”: false, “doc_values”: true }, “end”: { “type”: “long”, “doc_values”: true }, “expandedURL”: { “type”: “keyword”, “index”: false, “doc_values”: true },
“start”: { “type”: “long”, “doc_values”: true }, “url”: { “type”: “keyword”, “index”: false, “doc_values”: true } } }, “user”: { “properties”: { “createdAt”: { “type”: “date”, “doc_values”: true, “format”: “dateOptionalTime” }, “description”: { “type”: “keyword”, “index”: true },
“favouritesCount”: { “type”: “long”, “doc_values”: true }, “followersCount”: { “type”: “long”, “doc_values”: true }, “friendsCount”: { “type”: “long”, “doc_values”: true }, “id”: { “type”: “keyword”, “index”: true, “doc_values”: true }, “lang”: { “type”: “keyword”, “index”: true, “doc_values”: true },
“location”: { “type”: “keyword”, “index”: true, “doc_values”: true }, “name”: { “type”: “keyword”, “index”: true, “fields”: { “raw”: { “type”: “keyword”, “index”: false, “doc_values”: true } } }, “profileImageUrl”: { “type”: “keyword”, “index”: false, “doc_values”: true },
“screenName”: { “type”: “keyword”, “index”: true, “doc_values”: true }, “statusesCount”: { “type”: “long”, “doc_values”: true }, “url”: { “type”: “keyword”, “index”: false, “doc_values”: true }, “urlEntity”: { “properties”: { “displayURL”: { “type”: “keyword”, “index”: false, “doc_values”: true }, “end”: { “type”: “long”, “doc_values”: true },
“expandedURL”: { “type”: “keyword”, “index”: false, “doc_values”: true }, “start”: { “type”: “long”, “doc_values”: true }, “url”: { “type”: “keyword”, “index”: false, “doc_values”: true } } } } },
“userMentionEntities”: { “properties”: { “end”: { “type”: “long” }, “id”: { “type”: “long” }, “name”: { “type”: “keyword” }, “screenName”: { “type”: “keyword”, “index”: true, “doc_values”: true }, “start”: { “type”: “long” } } } } } }
}
Create a new project Once ES index is created is time to create a Inviso project.
Expose the frontend First we need to expose the Inviso frontend to login in the app.
bash port_forward.sh 192.168.99.100 443
And go to the next url: https://SERVER_IP
You will see the login screen Image Login with: User: parronator Password: 123456Aa
Once you are logged, follow the next steps to create a project Click on the button Add a new project Key project Fill the options project as you prefer, but twitter must be selected from Sources option Click on Create button
You will see a green toast pointing that project was created and you will redirect to projects dashboard with the new project created Image
Load test data
In insikt-deploy-entrega/ directory, there is a fille named test_smokees.py. You just need to execute it with:
python test_smokees.py
Note: If you get a “Magic numbers” error, you must delete the file with .pyc extension in the same directory as test_smokees.py file. You can use find . -name \*.pyc -delete Also inside test_smokees.py, delete: from insiktes import search, search2, search4, search_top, \ mysql_elk, get_id_list, get_source_list, get_top_list, notification To check if it works you just need to go to Dev Tools in Kibana and execute: GET demo/_search { “query”: { “match_all”: {} }
}
If worked, you must get results at in total of hits with at least 32 Image 0 comments