We provide IT Staff Augmentation Services!

Big Data Engineer Resume

2.00/5 (Submit Your Rating)

SUMMARY:

  • Five years of development background, especially on Bigdata tools and technologies. Extremely flexible and adaptable to a wide range of technologies and team environments, dedicated to constantly study to maximize productivity and respond quickly to changing business needs. Highly motivated and committed to my work, complex problem solving skills and teamwork.

TECHNICAL SKILLS:

Hadoop Core Services: HDFS, Map Reduce, Spark, YARN

Hadoop Distribution: Horton works, Cloudera, Apache

NO SQL Databases: HBase(Apache phoenix), Cassandra

Hadoop Data Services: Hive, Pig, Sqoop, Flume, Sqoop

Hadoop Operational Services: Zookeeper, Oozie, Zena

Monitoring Tools: Ganglia, Cloudera Manager

Cloud Computing Tools: AWS S3, EC2, ELB, RedShift, Azure datalake, Azure Data Factory, Azure databricks, Azure SQL, Azure data migration

Languages: C, Java/J2EE, Python, SQL, PL/SQL, Pig Latin, HiveQL, Unix Shell Scripting

Java & J2EE Technologies: Core Java, Servlets, Hibernate, Spring, Struts, JMS, EJB

Application Servers: Web Logic, Web Sphere, JBoss, Tomcat.

Databases: Oracle, MySQL, Postgress, Teradata

Operating Systems: UNIX, Windows, LINUX

Build Tools: Jenkins, Maven, ANT

Development Tools: Microsoft SQL Studio, Toad, Eclipse, NetBeans, Git, Winscp, Putty, Foundry Palantir.

Development methodologies: Agile/Scrum, Waterfall

Visualization and analytics tool: Tableau, MSBI, Foundry Palantir, Qlikview

PROFESSIONAL EXPERIENCE:

Big Data Engineer

Confidential

Responsibilities:

  • Adhere with Agile methodology and follow the process to maintain/ support and to document the details which includes knowledge sharing sessions, code reviews and design review.
  • Collaborate with senior resources to ensure consistent development practices and communicate complex problem - solving ideas to senior management and peers.
  • Build and maintain high-performance scalable, fault-tolerant, secure distributed software systems using Hadoop framework and complementary technologies like Apache Spark using Python and PySpark.
  • Build data streams using Cloud or on-premises technologies: AWS, Big Data, Hadoop, SQL/NOSQL, Unstructured databases to inject, load, transform, group, logically join and assemble data ready for data analysis / analytics/ reporting purposes.
  • Automation of data ingestion from variety of data sources using snapshot and incremental methodologies and applying different data transformations using PySpark for data cleansing, schema validations, schema evaluation support.
  • Implement complex big data solutions with a focus on collecting, parsing, managing, analyzing and visualizing large sets of data to turn information into insights using multiple platforms.
  • Create multiple intermediate data sets and writing hive queries which runs using map reduce framework.
  • Produce unit tests for Spark transformations and helper methods.
  • Develop multiple utilities to increase value of data by updating data sets using incremental process that automatically retrieve data from sources and feed target data sets.
  • Reducing manual work around and enhance operational efficiency and accuracy of reporting and analytical capabilities.

Data Engineer

Confidential

Responsibilities:

  • Developed scripts to automate data ingestion from multiple data sources, using sqoop jobs.
  • Automation of json data flattering from data lake and feed into hive tables, where data analytics are performed on top of hive tables.
  • Developed multiple spark jobs for data cleansing, data transformation, data monitoring and also capture changing dimensions.
  • Automated workflow using qlikview to develop data visualization and data dash boards.
  • Working with Scala and python as my primary languages to develop spark transformations.
  • Developed scalable architecture which can process current as well as historical data and logically join the data to other modules which can be used to understand multiple data dimensions.
  • Automation and scheduling of scripts to run on both time based and event based calls.
  • Working on agile methodology including develop support documentation, code review and design reviews.
  • Develop, refine and scale data management and analytics procedures, systems, workflows, and best practices.

Big Data Developer

Confidential

Responsibilities:

  • Automating of state files by cleaning and streamlining data files, providing HCSC with sensitive member information for clinical engagement.
  • Reducing manual work around and enhance operational efficiency and accuracy of reporting and analytical capabilities.
  • Developing Kafka consumer's API in Scala for consuming data from Kafka topics.
  • Developing code required for generating reports end-to-end for quick reporting platform by executing Spark - Scala jobs.
  • Experienced in writing live Real-time Processing and core jobs using Spark Streaming with Kafka as a data pipe-line system.
  • Used Kafka functionalities like distribution, partition, replicated commit log service for messaging systems by maintaining feeds.
  • Used Apache Kafka to aggregate data from multiple servers and make them available in Downstream systems for Data analysis and engineering type of roles.
  • Developed Sqoop and Kafka Jobs to load data from RDBMS, External Systems into HDFS and HIVE.
  • Working with Scala / Spark SQL for data cleansing and generating Data Frames to transform them into row DF's to populate the aggregate tables.
  • Automating transfer of files from and to state agencies and the third-party vendor which reduces data latency and improves consistent of data transfer to ensures key HFS correspondence data is managed in data warehouse and not in excel spreadsheets.
  • Analyzing and developing GDP component of the future state enterprise data foundation including but not limited to creation of and claims data repositories.
  • Purpose of these repositories is to provide single source of truth eliminating discrepancies in term of what is provided/Consumed by Providers, Vendors, and Regulatory agencies.
  • Working on automation of file auditing and balancing across different nodes, also generating reports as well as notifications.

We'd love your feedback!