We provide IT Staff Augmentation Services!

Data Engineer Resume

Denver Data, EngineeR

SUMMARY

  • Software professional having around 7+ years of IT experience with Big Data Ecosystem, Experience in ingestion, storage, querying, processing and analysis of big data.
  • Extensive experience working in various verticals such as Confidential .
  • Experience in creating appications using Spark with Python.
  • Hands on experience in using Hadoop ecosystem components like MapReduce, HDFS, HBase, Zookeeper, Hive, Sqoop and Oozie.
  • Importing and exporting data into HDFS and Hive using Sqoop.
  • Worked on developing, monitoring and Jobs Scheduling using UNIX Shell Scripting.
  • Experienced in installing, configuring, and administrating Hadoop cluster.
  • Hands - on experience with Hadoop applications.
  • Experience in working with MapReduce programs, Pig scripts and Hive commands to deliver the best results
  • Good Knowledge and experience with the Hive Query optimization and Performance tuning.
  • Hands on experience in writing Pig Latin Scripts and custom implementations using UDF'S .
  • Experience in tuning of Hadoop Cluster to achieve good performance in processing.
  • Experience in upgrading the existing Hadoop cluster to latest releases.
  • Experience in Data Integration between Pentaho and Hadoop.
  • Experience in supporting data analysis projects by using Elastic MapReduce on the Confidential Web Services (AWS) cloud. Performed Export and import of data into S3.
  • Well trained in Problem Solving Techniques, Operating System Concepts, Programming Basics, Structured Programming and RDBMS.
  • Knowledge of ETL methods for data extraction, transformation and loading in corporate-wide ETL Solutionsand Data warehouse tools for reporting and data analysis.
  • Strong analytical skills with ability to quickly understand client’s business needs. Involved in meetings to gather information and requirements from the clients. Leading the Team and involved in Onsite, Offshore co-ordination
  • Experience includes Requirements Gathering/Analysis, Design, Development, versioning, Integration, Documentation, Testing, Build and Deployment
  • Technical professional with management skills, excellent business understanding and strong communication skills.

TECHNICAL SKILLS

Languages: UNIX, SQL, Shell Script

Big Data and Hadoop Eco Systems: MapReduce, Sqoop, Hive, HDFS.

Databases: MySQL

Cloud Computing: Confidential Web Services (EC2, EMR, S3, RDS, SQS, AURORA, ATHENA)

Build Tools: Maven, Ant

DatabasesTools: SQL Developer, SQL Workbench

Development Tools: Eclipse, Putty, Intellij, Pycharm

PROFESSIONAL EXPERIENCE

Confidential, Denver

Data Engineer

Responsibilities:

  • Work closely with Enagagement managers and Data Scientists to understand the requirements.
  • Ingest and integrate different datasets from various sources.
  • Built spark applications to reduce workload on SQL.
  • Creating pipelines using python with spark framework to ingest data from different sources to the client’s destination.
  • Validating data by making data profiling around it with the databases like Hive.
  • Worked on AWS services like S3, EC2, EMR, AWS Glue, Athena, IAM, SNS, Dynamo DB & Cloud Watch. Experience in data sets processing and storage.
  • Experience in using XML, Parquet, CSV and Confidential file formats and other compressed file formats like Gzip and Snappy.
  • Worked on creating Python application to consume messages from AWS SQS process the request and send appropriate machine learning model results to it.
  • Creating service based applications using python to move data across accounts assuming the role(as direct access is denied).
  • Move data from/to HDFS and create tables on top of them.
  • Hive is used on top of Beeline for faster and better performance. 
  • Sqoop is used to move large datasets (history) to HDFS.
  • Involved in extracting customer’s Big data from various data sources into AWS S3 for storing data. This included data from mainframes and databases from source.
  • The Hive tables created as per requirement were managed or external tables defined with appropriate static and dynamic partitions, intended for efficiency.
  • Created shell scripts to execute batch jobs as a CRON job part of POC.
  • Used secure copy to fetch files from remote server and vice versa.
  • Used RSA key gen and maintain public, private keys and use SSO for signing into edge nodes/ remote servers.
  • Developed various components in Python that can be used in ETL batch processing jobs.
  • Used Jenkins as the process of deploying the Airflow jobs/ Applications to the server.
  • Used Git as source control management giving a huge speed advantage on centralized systems that have to communicate with a server.
  • Airflow workflow engines to manage interdependent jobs and to automate several types of Hadoop jobs such as Pyton MapReduce, spark, Hive and Sqoop as well as system specific jobs.
  • Have knowledge in Snowflake which is used by neighbor teams but not worked directly over it.
  • Worked in Agile methodology.

Tools: Hadoop, Python, Confidential Web Services, Spark, SparkSql, Hue, Hive, Jenkins, HDFS, Sqoop, Unix / Linux.

Confidential, CA

Data Engineer

Responsibilities:

  • Involved in extracting customer’s Big data from various data sources into AWS S3 on which datalake is located for storing data .
  • This included data from mainframes and databases from source.
  • Creating instances using AWS EMR for a short term as part of ETL jobs.
  • Exposure to build spark applications to reduce workload on sql and heavy coding in java.
  • Hive is used on top of Spark for fatser and better performance .
  • Developed various components on java using spring batch that can be used in ETL batch processing jobs.
  • Developed Python and shell scripts to schedule jobs.
  • Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to stream the log data from servers.
  • The Hive tables created as per requirement were managed or external tables defined with appropriate static and dynamic partitions, intended for efficiency.
  • Implemented Partitioning, Bucketing in Hive for better organization of the data.
  • Developed UDFs in Hive
  • Developed python scripts to create batch jobs.
  • Azkaban workflow engine to manage interdependent jobs and to automate several types of Hadoop jobs such as Java MapReduce, Hive and Sqoop as well as system specific jobs.
  • Used Jenkins as the process of deploying the azkaban jobs to the server.
  • Installed and configured various components of Hadoop ecosystem and maintained their integrity.
  • Implemented Fair Scheduler on the job tracker to allocate the fair amount of resources to small jobs.
  • Involved in creating t sreaming application using spark with scala.

Tools: Confidential Web Services, Hadoop, Spark,Scala, SparkSql, Datalake, H2, Sqoop, Hive, Java, Postgres, Python, Jenkins.

Confidential

Data Engineer

Responsibilities:

  • Involved in extracting customer’s Big data from various data sources into AWS S3 on which datalake is located for storing data . This included data from mainframes and databases from source.
  • Creating instances using AWS EMR for a short term as part of ETL jobs.
  • Hive is used on top of Spark for fatser and better performance .
  • Developed various components on java using spring batch that can be used in ETL batch processing jobs.
  • Developed Python and shell scripts to schedule jobs.
  • Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to stream the log data from servers.
  • The Hive tables created as per requirement were managed or external tables defined with appropriate static and dynamic partitions, intended for efficiency.
  • Implemented Partitioning, Bucketing in Hive for better organization of the data.
  • Developed UDFs in Hive
  • Developed python scripts to create batch jobs.
  • Airflow workflow engine to manage interdependent jobs and to automate several types of Hadoop jobs such as Python, Hive and Sqoop as well as system specific jobs.
  • Used Jenkins as the process of deploying the Airflow jobs to the server.

Tools: Confidential Web Services, Hadoop, Spark,Scala, SparkSql, Datalake, H2, Sqoop, Hive, Postgres, Python, Jenkins.

Confidential, Denver

Big Data | Hadoop Consultant

Responsibilities:

  • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hive and Sqoop. Responsible for building scalable distributed data solutions using Hadoop Cloudera.
  • Involved in importing and exporting data (SQL Server, Oracle, csv and text file) from local/external file system and RDBMS to HDFS. Load log data into HDFS using Flume.
  • ETL Data Cleansing, Integration &Transformation using Pig: Responsible of managing data from disparate sources.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Data Warehousing: Designed a data warehouse using Hive, created and managed Hive tables in Hadoop.
  • Created various UDF functions in Pig and Hive to manipulate the data for various computations.
  • Created Map Reduce Functions for certain computations.
  • Workflow Management: Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
  • Created and maintained Technical documentation for launching Hadoop Clusters and for executing Hive queries and Pig Scripts.
  • Extensive hands on experience in Hadoop file system commands for file handling operations.
  • Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for hive performance enhancement and storage improvement.
  • Worked on developing, monitoring and Jobs Scheduling using UNIX Shell Scripting.
  • Worked with parsing XML files using Map reduce to extract sales related attributed and store it in HDFS.
  • Involved in building TBUILD scripts to import data from Teradata using Teradata Parallel transport APIs.
  • Used Spark to enhance the performance of the project
  • Also have a good knowledge on Scala.
  • Good Knowledge and exposure to cassandra.
  • Worked in an Agile type of methodology.

Tools: Spark,Map Reduce, Hive, Cloudera, Python and unix scripting.

Confidential

Software Developer

Responsibilities:

  • Designed and added new functionality extended existing application using J2EE,XML,Ajax, Servlets, JSP.
  • Studied the impact of the requirement and prepared the Requirement functional and technical documents.
  • Created different batch programs to clean up tables in DB2 database.
  • Extensively used Collections and Exceptions in batch program for database clean up.
  • Worked on UNIX shell scripting to run the JAR file created for batch program.
  • Used Struts framework in UI designing and validations.
  • Developed Action Classes, which acts as the controller in Struts framework.
  • Client side validations are done using JavaScript and server side validations are done using Struts validator framework.
  • AJAX forms are created for update operations.
  • Data was converted into Confidential using JSP tags.
  • Enhanced the existing application to meet the business requirement.
  • Establishing JDBC connection using database connection pool.
  • Wrote complex SQL statements to retrieve data from the DB2 database.
  • Participated in the Production support and maintenance of the project.
  • Created new tables in DB2 database.
  • The application was developed using Eclipse on Windows XP. Deployed the application on Apache tomcat server6.0 on windows server 2003.
  • Used ClearCase version control system.
  • Performed usability testing for the application using JUnit Test.

Tools: JAVA, JavaScript, Ajax, Confidential, Struts, Design Patterns, Eclipse, Apache tomcat server, DB2, UNIX, ClearCase, Junit

Hire Now