We provide IT Staff Augmentation Services!

Data Engineer Resume

2.00/5 (Submit Your Rating)

Atlanta-, GA

SUMMARY

  • Over 6 years of professional IT experience in Business Analysis, and Big data Ecosystem related technologies including Hadoop HDFS, Map Reduce, Cloudera, Hive, Oozie with full project development, implementation, and deployment on Unix/ Windows / Linux.
  • Expertise in HiveQL, Spark Streaming, building Scala Scripts and converting SQL queries into Spark transformations using Spark RDDs, Scala, and Python.
  • Good Knowledge in writing application using Python using different libraries like Pandas, NumPy, SciPy, Matplotlib etc.
  • Good understanding of Hadoop Architecture and various components such as Name Node, Secondary Name Node, job Tracker, HDFS, Data Node.
  • Strong Knowledge in NO - SQL data bases like Snowflake, H-Base and MongoDB.
  • Experience with AWS/EMR, Cloudera Manager.
  • Good knowledge on AWS Cloud Services like VPC, EC2, RDS, Redshirt, Data Pipeline, EMR, DynamoDB, SNS.
  • Experience in designing CI/CD pipeline flows with Jenkins.
  • Experience in working on Flume to handle larger volumes of streaming data and Kafka to load the log data from multiple sources into HDFS.
  • Expertise in J2EE technologies including JSP, spring, Struts, JMS, Hibernate, JDBC, XML, XSLT and JNDI.
  • Good knowledge on Object oriented programming (OOPS) with C#, Java and understanding of data structures, algorithms.
  • Partnered with Data science team to work on large data sets and Machine learning notebooks to apply machine learning algorithms to process large amounts of data.
  • Worked on projects under Agile/Waterfall methodologies.
  • Committed to excellence, self-motivator, quick-learner, team-player, and a prudent developer with strong problem-solving, analytical skills and communication skills.

TECHNICAL SKILLS

Hadoop Ecosystem: Hive, H-Base, Kafka, Spark, ZooKeeper, Oozie, Cloudera, Yarn

OS: Windows XP, Windows 7/8 and 10, Linux.

Programming Languages: C#, Java, Python, Scala.

Cloud Technologies: AWS (EMR, S3, EC2)

Java API/Frameworks: Spring, Struts, hibernate

Ide Tools: Eclipse, IntelliJ IDEA

Application/Webservers: Web Logic, Web Sphere, Apache Tomcat

Data Bases: MySQL, JDBC, Oracle

PROFESSIONAL EXPERIENCE

Confidential, Atlanta- GA

Data Engineer

Responsibilities:

  • Designed scalable and cost-effective architecture in AWS with Data services for Data Life Cycle including collection, ingestion, storage, processing, and visualization.
  • Worked on creation of Hive tables, loading, and analyzing data using HiveQL queries.
  • Worked on importing data into the AWS RDS from various sources using the API’s.
  • Developing and designing ETL Components to extract data from Hadoop Data Lake and from Teradata based Data Warehouse.
  • Develop shell scripts to schedule full and incremental load, check data quality.
  • Migrate Hive Scripts from on-prem to AWS Cloud.
  • Implement optimization techniques in hive like partitioning tables, De-normalizing data and Bucketing and Spark techniques like Data Serialization and Broadcasting.
  • Involve in performance tuning of Spark Applications for setting right level of Parallelism and memory tuning.
  • Help team in building AWS native tables, data validation between Teradata and AWS Hive and solve production issues.
  • Use HiveQL for data analysis like creating tables and import the structured data to specified tables for reporting.
  • Work with infrastructure team, troubleshooting connectivity issues within AWS gateway, ODBC/JDBC connectivity issues, Kerberos accounts and key tabs.
  • Import and export data into HDFS and Hive using Sqoop and migration of huge amounts of data from different databases (i.e., Oracle, SQL Server) to Hadoop.
  • Use Spark SQL to create structured data by using data frame and querying from other data sources and Hive.
  • Flatten On-prem data and Ingest into Druid.
  • Involve in Job management and Developed job processing scripts using Airflow scheduler.
  • Work with the business team to gather the requirements and participate in the Agile planning meetings to finalize the scope.
  • Actively participate in Story Time, Sprint Planning and Sprint Retrospective meetings.
  • Work as L2 support over the weekend and monitor production critical jobs.

Environment: Hadoop, Spark, HDFS, Hive Scripts, Scala, Sqoop, Hue, Hive, Automic, Shell Scripting, SQL, AWS.

Confidential, Tampa- FL

Spark/ Hadoop Developer

Responsibilities:

  • Responsible for data extraction and data ingestion from different data sources into Hadoop Data Lake by creating ETL pipelines using Pig, and Hive.
  • Job duties involved the design, development of various modules in Hadoop Big Data Platform and processing data using Map Reduce, Hive, Pig, Sqoop and Oozie.
  • Extract, transform, and load (ETL) data from multiple federated data sources (JSON, relational database, etc.) with DataFrames in Spark.
  • Extend the capabilities of DataFrames using User Defined Functions in Python and Scala.
  • Resolve missing fields in DataFrame rows using filtering and imputation.
  • Load log data into HDFS using Flume, Kafka and performing ETL integrations
  • Experience in Cloudera Hadoop Upgrades and Patches and Installation of Ecosystem Products through Cloudera manager along with Cloudera Manager Upgrade
  • Analyzed the behavior of the user by working with HiveQL on logs of Big Data.
  • Experience in application development efforts in a Business Intelligence/Data Warehouse environment.
  • Expert in Strong understanding of dimensional data modeling, Strong SQL optimization capabilities, Metadata Management (Connections, Data Model, VizQL Model).
  • Worked with security team, troubleshooting connectivity issues within Knox gateway, ODBC/JDBC connectivity issues, Kerberos accounts & keytabs.
  • Working with data delivery team to setup new Hadoop users, Linux users, setting up Kerberos principals and testing HDFS, Hive, Pig and MapReduce access for the new users on Horton works &Cloudera Platform
  • Developed data pipeline using Flume, Sqoop to ingest customer behavioral data and purchase histories into HDFS for analysis.
  • Expertise in uncovering data insights and identifying data issues using Tableau, Qlik etc.,
  • Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.
  • Used HiveQL for data analysis like creating tables and import the structured data to specified tables for reporting.
  • Knowledge on using Pig to perform data validation on the data ingested using scoop and flume and the cleansed data set is pushed into HBase.
  • Participated in development/implementation of Cloudera Hadoop environment.
  • Involved in architectural design cluster infrastructure, Resource mobilization, Risk analysis and reporting.
  • Experience in Data Visualization using QlikView, Tableau.
  • Commissioning and de-commissioning the data nodes and involve in Name Node maintenance.
  • Regular backup and clear logs from HDFS space. This is to utilize data nodes optimally. Write shell scripts for time bound commands execution.
  • Edit and configure HDFS and tracker parameters.
  • Involve code review tasks in simple to complex Map/reduce Jobs using Hive and Pig
  • Cluster Monitoring using Big Insights ionosphere tool. Importing of data from various data sources, parse into structured data region wise and date wise. Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior.
  • Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Worked with Zookeeper, Oozie, and Data Pipeline Operational Services for coordinating the cluster and scheduling workflows
  • Troubleshooting performance issues with ETL/SQL tuning
  • Installed Oozie workflow engine to run multiple Map Reduce, Hive HQL and Pig jobs.
  • Developed HDFS with huge amounts of data using Apache Kafka.
  • Collected the log data from web servers and integrated into HDFS using Flume.

Environment: Hadoop, Spark, HDFS, Hadoop Pig Scripts, Scala, Sqoop, Hue, Hive, Impala, Oozie, Java, UNIX, Parquet, Python, Shell Scripting, SQL.

Confidential, New York- NY

Hadoop Consultant

Responsibilities:

  • Responsible for building customer centric Data Lake in Hadoop which would serve as the Analysis and Data Science Platform.
  • Responsible for building scalable distributed data solutions on Cloudera distributed Hadoop.
  • Used Sqoop, Kafka for migrating data and incremental import into HDFS and Hive from various other data sources.
  • Modeled and build Hive tables to combine and store structured data and unstructured sources of data for best possible access.
  • Used Cassandra to store billions of records to enable faster & efficient querying, aggregates & reporting.
  • Implemented Real time analytics on Cassandra data using thrift API.
  • Working knowledge of ETL tools (i.e., DataStage or Informatica)
  • Developed Spark Jobs using Scala and Python (Pyspark) APIs.
  • Migrated SAS and Python programs into Spark Jobs for Various Processes.
  • Involved in Job management and Developed job processing scripts using Oozie workflow.
  • Implemented optimization techniques in hive like partitioning tables, De-normalizing data & Bucketing.
  • Used Spark SQL to create structured data by using data frame and querying from other data sources and Hive.
  • To support Data Scientists with Data and Platform Setup for their analysis and finally migrating their finished product to Production.
  • Worked on cleansing and extracting meaningful information from click stream Data using Spark and Hive.
  • Involved in performance tuning of Spark Applications for setting right level of Parallelism and memory tuning.
  • Used optimization techniques in spark like Data Serialization and Broadcasting.
  • Designed and built the Reporting Application, which uses the Spark SQL to fetch and generate reports on HBase table data.
  • Created HBase column families to store various data types coming from various sources.
  • Loaded data into the cluster from dynamically generated files
  • Assisted in upgrading, configuration, and maintenance of various Hadoop infrastructures
  • Created common audit and error logging processes job monitoring and reporting mechanism
  • Optimization of existing algorithms in Hadoop using Spark, Spark-SQL, and Data Frames.
  • Implemented POC in persisting click stream data with Apache Kafka
  • Followed Agile & Scrum principles in developing the project

Environment: Hadoop, HDFS, Spark, Spark-sql, Sqoop, Hive, Python, Scala, Pyspark, and Oozie, Cloudera

Confidential, White Plains- NY

Hadoop Consultant

Responsibilities:

  • Extensively involved in Design phase and delivered Design documents.
  • Worked on Horton works version HDP 2.3
  • Worked with the business team to gather the requirements and participated in the Agile planning meetings to finalize the scope.
  • Importing and exporting data into HDFS and Hive using Sqoop and Migration of huge amounts of data from different databases (i.e., Oracle, SQL Server) to Hadoop.
  • Used Pig as ETL tool to do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
  • Load and Transform large sets of structured and semi structured data. Responsible to manage data coming from different sources.
  • Developed data pipeline using Flume, Sqoop, Pig and MapReduce to ingest customer behavioral data and financial histories into HDFS for analysis.
  • Involved in creating Hive Tables, loading data, and writing hive queries.
  • Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
  • Developed multiple Map Reduce jobs in Pig and Hive for data cleaning and pre-processing.
  • Involved in defining job flows. Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Managed and reviewed the Hadoop Log files.
  • Developed complex hive queries using Joins and partitions for huge data sets as per business requirements and load the filtered data from source to edge node hive tables and validate the data.
  • Performed Bucketing and Partitioning of data using apache Hive which saves the processing time and generating proper sample insights.
  • Moved all log/text files generated by various products into HDFS location.
  • Exported data from HDFS environment into RDBMS using Sqoop for report generation and visualization purpose.

Environment: Horton Works Hadoop, Eclipse, java, Sqoop, Pig, Oozie, Hive, Flume, MySQL, Oracle DB.

Confidential

Hadoop Developer

Responsibilities:

  • Thoroughly involved in Agile (Scrum) SE Methodology that includes daily scrum with the team.
  • Experience in using Eclipse as IDE for application development. Worked on GIT for the code repository.
  • Developed Web Applications based on Java Server Pages (JSP), Java Enterprise Edition (JEE).
  • Worked on jQuery for user interface validations. Used HTML features to develop user interfaces. Used JUnit framework for the Unit Testing.
  • Analysis, design, and development of Application based on J2EE using Struts, Spring and Hibernate.
  • Used spring framework for dependency injection with the help of Configuration files.
  • Developed various test cases and performance unit testing using JUnit framework.
  • Responsible for analysis, design, development, and integration of UI components with backend J2EE.
  • Developed user interfaces using JSP, JavaScript, HTML and CSS.
  • Implemented the project using the Spring Framework (used Spring IOC and Dependency Injection, Spring MVC).
  • Used Hibernate in the persistence tier to connect with database.
  • Developed External style sheets (CSS) to bring rich look to user interface.
  • Involved in the implementation of DAO using Spring-Hibernate ORM.
  • Developed REST based web services.
  • Integrated services with user Interfaces using Ajax.
  • Integrated JSON response form services with user Interfaces.
  • Developed SQL queries and stored procedures for retrieving data
  • Used Log4j for logging to trace the errors.
  • Responsible for unit and integration testing.

Environment: Java 7, Servlets, JSP 2.0, Spring, Hibernate, SQL Developer, HTML, jQuery, JavaScript, CSS, Java Web Services, REST, Tomcat server, Eclipse, HTML5, SVN, JSON, Agile.

We'd love your feedback!