Procesamiento de lenguaje natural

Trabajar datos no estructurados

Jose Luis Ayerdis

Procesamiento de lenguaje natural

Clasificación de texto

Deteccion de idiomas

Sistemas de traduccion

Sistemas Preguntas y respuestas

Segmentacion de palabras

Resumen automático

Reconocimiento de entidades

Generacion de texto

Analisis de sentimiento

Inferencia de topicos

Datos estructurados


Datos no estructurado

Texto libre

Texto libre

  • Comentarios
  • Post en redes sociales
  • Correos electronicos y Chats
  • Documentos de texto
  • Logs de aplicaciones
  • Datos de sensores
NLP vs Text Mining

Librerias en python



  • Tokenizacion
  • Stemming y Lemmatización
  • Part-Of-Speech


The road to creativity passes so close to the madhouse and often detours or ends there The road to creativity passes so close to the madhouse and often detours or ends there The road to creativity passes so close to the madhouse and often detours or ends there The road to creativity passes so close to the madhouse and often detours or ends there The road to creativity passes so close to the madhouse and often detours or ends there The road to creativity passes so close to the madhouse and often detours or ends there The road to creativity passes so close to the madhouse and often detours or ends there The road to creativity passes so close to the madhouse and often detours or ends there The road to creativity passes so close to the madhouse and often detours or ends there The road to creativity passes so close to the madhouse and often detours or ends there The road to creativity passes so close to the madhouse and often detours or ends there The road to creativity passes so close to the madhouse and often detours or ends there The road to creativity passes so close to the madhouse and often detours or ends there The road to creativity passes so close to the madhouse and often detours or ends there The road to creativity passes so close to the madhouse and often detours or ends there The road to creativity passes so close to the madhouse and often detours or ends there The road to creativity passes so close to the madhouse and often detours or ends there


Problemas usuales

  • Eleccion de mas de un token puede llevar a casos extraños (Object-c)
  • Eleccion de espacio no es util fuera de idiomas indo-europeos (عشوائي)
  • Palabras contracciones en ingles como (don't, won't)


Escribiendo Escribiendo
Escrito Escrito


Encontrar automaticamente la raiz de un token, a traves de reglas deterministicas




Encontrar automaticamente la raiz de un token, a traves de metodos linguisticos como Part of speech.


Gracias totales

@necronet @necronet

Slides creadas con revealjs