Data Engineer Resume Houston, TX - Hire IT People

SUMMARY

Having 7+ years of professional software development experience wif specialization in Big Data Engineering and Analytics and Java Projects.
Hands on experience in working wif Spark and Hadoop ecosystems like MapReduce, HDFS, Sqoop, Hive, Kafka, Oozie, Yarn, Impala, Pig, Flume and NoSQL Databases HBase.
Excellent noledge and understanding of Distributed Computing and Parallel processing frameworks.
Strong experience in working wif both batch and streaming process using Spark framework.
Good experience working wif Kafka clusters to storing real time streaming data and write custom Kafka producers and spark streaming consumers.
Experience in installation, configuration, and monitoring Hadoop clusters both on - perm and cloud.
Strong experience building data lakes in AWS Cloud utilizing services like S3, EMR, Glue Metastore, Athena, Redshift, Step Functions etc.,
Strong experience and noledge of real time data analytics using Kafka and Spark Streaming
Expertise in developing production ready Spark applications utilizing Spark RDD, Spark Data frames, Spark SQL, and Spark Streaming API's.
Strong hands-on noledge on using programming languages Scala and Python for developing Spark Applications.
Good experience troubleshooting data pipeline failures, identifying bottlenecks in long running pipelines.
Good experience productionizing and automating end to end data pipelines and allowing downstream applications to consume the data from data lakes in most optimized fashions.
Strong experience working wif various file formats like Parquet, ORC, Avro and JSON.
Strong experience using various features of Hive like creating managed and external tables, partitioning, and bucketing etc.,
Extending Hive core functionality by writing custom UDF’s for Data Analysis.
Responsible for developing multiple Kafka Producers and Consumers from scratch as per the software requirement specifications.
Proficient in importing/exporting data from RDBMS to HDFS using Sqoop.
Having hands on experience wif Apache Nifi and Apache Airflow
Run DAGs using airflow. Created workflows using apache airflow.
Hands on experience on creating Docker containers of microservice Rest applications.
Strong experience working wif Core Java and Spring Boot for developing Rest APIs, JDBC, JEE technologies and Servlets.
Experience in version control systems using SVN and Git/GitHub and issue tracking tools like Jira.
Extensive experience working wif relational databases like PostgreSQL, Teradata, and MySQL database
Worked on Agile/SCRUM software development.
Ability to meet deadlines and handle pressure in coordinating multiple tasks in the work environment.

TECHNICAL SKILLS

Big Data Tools: HDFS, Map Reduce, Sqoop, Flume, Pig, Hive, Oozie, Impala, Zookeeper, Ambari, Storm, Spark, and Kafka

No-SQL: HBase, Cassandra, MongoDB

Build and Deployment Tools: Maven, Sbt, Git, SVN, Jenkins

Programming and Scripting: Java, Scala, Python, SQL, Shell Scripting, Pig Latin, HiveQL

Databases: Teradata, Redshift, Oracle, My SQL, Postgres

Web Dev. Technologies: HTML, XML, JSON, CSS, JQUERY, JavaScript

AWS Services: EC2, EMR, S3, Redshift, EMR, Lambda, Glue, Simple Workflow, Athena

PROFESSIONAL EXPERIENCE

Confidential, Houston, TX

Data Engineer

Responsibilities:

Responsible for ingesting large volumes of user behavioral data and customer profile data to Analytics Data store.
Developed custom multi-threaded Java based ingestion jobs as well as Sqoop jobs for ingesting from FTP servers and data warehouses.
Developed Scala based Spark applications for performing data cleansing, event enrichment, data aggregation, de-normalization and data preparation needed for machine learning and reporting teams to consume.
Worked on troubleshooting spark application to make them more error tolerant.
Worked on fine-tuning spark applications to improve the overall processing time for the pipelines.
Wrote Kafka producers to stream the data from external rest API’s to Kafka topics.
Wrote Spark-Streaming applications to consume the data from Kafka topics and write the processed streams to Snowflake.
Experienced in handling large datasets using Spark in Memory capabilities, using broadcasts variables in Spark, TEMPeffective & efficient Joins, transformations and other capabilities.
Worked extensively wif Sqoop for importing data from Oracle.
Designing and customizing data models for Data warehouse supporting data from multiple sources on real time.
Experience working for EMR cluster in AWS cloud and working wif S3, Redshift, Snowflake.
Involved in creating Hive tables, loading and analyzing data using hive scripts.
Implemented Partitioning, Dynamic Partitions, Bucketing in Hive.
Good experience wif continuous Integration of application using Bamboo.
Used Reporting tools like Tableau to connect wif Impala for generating daily reports of data.
Collaborated wif the infrastructure, network, database, application and BA teams to ensure data quality and availability.
Designed, documented operational problems by following standards and procedures using JIRA.

Environment: Hadoop, Spark, Scala, Python, Hive, Sqoop, Oozie, Kafka, Amazon EMR, YARN, JIRA, Amazon AWS, Shell Scripting, SBT, GITHUB, Maven.

Confidential, Richmond, VA

BigData Developer

Responsibilities:

Developed Spark Applications by using Scala and Implemented Apache Spark data processing Project to handle data from various RDBMS and Streaming sources.
Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, TEMPEffective & efficient Joins and Transformations.
Used Spark for implementing the transformations on the historic data.
Experience wif Pyspark for using Spark libraries by using Python scripting for data analysis and aggregation and for utilizing data frames, developed Spark SQL API for processing data.
Used Spark programming API over EMR Cluster Hadoop YARN to perform various data processing requirements.
Run DAGs using Apache airflow to structure batch jobs in an extremely efficient way.
Developed Spark Scala applications using both RDD/Data frames/Spark Sql for Data Aggregation, queries and writing data back into OLTP system using Spark JDBC.
Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
Configured Spark Streaming to receive real time data from the Kafka and store the processed stream data back to Kafka.
Experienced in writing live Real-time Processing using Spark Streaming wif Kafka.
Involved in creating Hive tables and loading and analyzing data using hive queries.
Developed Hive queries to process the data and generate the data cubes for visualizing.
Implemented schema extraction for Parquet and Avro file Formats in Hive.
Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
Extensively worked wif S3 bucket in AWS.

Environment: Spark, Spark-Streaming, Spark SQL, AWS EMR, S3, Hive, Apache Kafka, Java, Scala, Shell scripting, Jenkins, Eclipse, Git, Tableau, MySQL and Agile Methodologies.

Confidential, Jersey City, NJ

Hadoop Developer

Responsibilities:

Responsible for building scalable distributed data solution using Hadoop Cluster environment wif Cloudera distribution.
Convert raw data wif sequence data format, such as Avro and Parquet to reduce data processing time and increase data transferring efficiency through the network.
Worked on building end to end data pipelines on Hadoop Data Platforms.
Worked on Normalization and De-normalization techniques for optimum performance in relational and dimensional databases environments.
Designed developed and tested Extract Transform Load (ETL) applications wif different types of sources.
Creating files and tuned the SQL queries in Hive Utilizing HUE. Implemented MapReduce jobs in Hive by querying the available data.
Exploring wif Spark to improve the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frame, pair RDD’s.
Experience wif Pyspark for using Spark libraries by using Python scripting for data analysis.
Involved in converting HiveQL into Spark transformations using Spark RDD and through Scala programming.
Created User Defined Functions (UDF), User Defined Aggregated (UDA) Functions in Pig and Hive.
Worked on building custom ETL workflows using Spark/Hive to perform data cleaning and mapping.
Implemented Kafka Custom encoders for custom input format to load data into Kafka portions.
Support for the cluster, topics on the Kafka manager. Cloud formation scripting, security and resource automation.

Environment: Python, Cloudera, HDFS, MapReduce, Flume, Kafka, Zookeeper, Pig, Hive, HQL, HBase, Spark, Kafka, ETL, Rest Services.

Confidential

Hadoop/Java Developer

Responsibilities:

Installed and configured Apache Hadoop to test the maintenance of log files in Hadoop cluster.
Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster.
Installed Oozie workflow engine to run multiple Hive and Pig Jobs.
Setup and benchmarked Hadoop/HBase clusters for internal use.
Developed Java MapReduce programs for the analysis of sample log file stored in cluster.
Developed Simple to complex Map/reduce Jobs using Hive and Pig
Developed Map Reduce Programs for data analysis and data cleaning.
Developed PIG Latin scripts for the analysis of semi structured data.
Developed and involved in the industry specific UDF user defined functions
Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
Used Sqoop to import data into HDFS and Hive from other data systems.
Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
Migration of ETL processes from RDBMS to Hive to test the easy data manipulation.
Developed Hive queries to process the data for downstream data analysis.

Environment: Apache Hadoop, HDFS, Cloudera Manager, CentOS, Java, MapReduce, Eclipse, Hive, PIG, Sqoop, Oozie and SQL.

Confidential

Java Developer

Responsibilities:

Designed and developed applications using Spring MVC framework wif Agile Methodology.
Developed JSP and HTML pages using CSS and JavaScript as part of the presentation layer.
Hibernate framework is used in persistence layer for mapping an object-oriented domain model to database.
Developed database schema and SQL queries for querying, inserting, and managing database.
Implemented various design patterns in the project such as Data Transfer Object, Data Access Object and Singleton.
Used Git for Source Code Management.
Used Maven scripts to fetch, build, and deploy application to development environment
Created RESTFUL web service interface to Java-based runtime engine.
Used Git for Source Code Management.
Used Apache Tomcat for deploying the application.
Used Junit for functional and unit testing of code.

Environment: Eclipse IDE, Java/J2EE, Spring, Hibernate, JSP, HTML, CSS, JavaScript, Maven, RESTful Web services, Apache Tomcat, Oracle, JUnit, Git

We provide IT Staff Augmentation Services!

Data Engineer Resume

Houston, TX

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship