DataOps Engineer

Job description

The DataOps engineer maintain and prepare data processing infrastructure to support large and complex use cases throughout the enterprise. The person in this role, creates scalable and reusable solutions for gathering, collecting, storing, processing, and serving data on both large and very large (i.e. Big Data) scales. These solutions can include solutions in any of the following domains: ETL, business intelligence, analytics, persistence (relational, NoSQL, data lakes), search, data warehousing, stream processing, and machine learning.

Role Responsibilities

  • Assists in the development of large-scale data structures and pipelines to organize, collect and standardize data that helps generate insights and addresses reporting needs.
  • Applies understanding of key business drivers to accomplish own work.
  • Writes ETL (Extract / Transform / Load) processes, designs database systems and develops tools for real-time and offline analytic processing.
  • Integrates data from a variety of sources, assuring that they adhere to data quality and accessibility standards.
  • Plan and work on internal projects as needed, including legacy system replacement, monitoring and analytics improvements, tool development, and technical documentation.


    • Ability to perform deep-dive technical troubleshooting in critical situations.
    • Be a team player, disciplined, and have great attention to details
    • Strong communications skill with both technical and non-technical peers.
    • Ability to juggle multiple tasks at once
    • Interested in distributed and highly available systems
    • Ability to leverage multiple tools and programming languages to analyze and manipulate data sets from disparate data sources.
    • Ability to understand complex systems and solve challenging analytical problems.
    • Experience building data transformation and processing solutions.
    • Expert in Hadoop architecture
    • Expert in Batch and Stream Processing system (Like Spark, Flink, Airflow, …)
    • Good understanding of Linux-based OS
    • Familiar with data pipeline and data analyzing ecosystems
    • Strong scripting skills with the preferred language of Python and Java
    • Familiar with Configuration Management and CI/CD pipelines/tools
    • Proficiency in at least one general-purpose programming language (Python, Java, Scala, Go)
    • Familiar with Linux containers and containers orchestration system (e.g. , Docker, Kubernetes)
    • Familiar with Elasticsearch