We provide IT Staff Augmentation Services!

Data Engineer Resume

2.00/5 (Submit Your Rating)

SUMMARY

  • Highly dedicated, inspiring, and expert Data Engineer with around 6 years of IT industry experience exploring various technologies, tools and databases likeBig Data, AWS, S3, Snowflake, Hadoop, Hive, Spark, python, Sqoop, Tableau, SQL, PLSQL, and Redshift
  • Over 6+ years of overall IT experience in a variety of industries, which includes hands on experience in Big Data and Data warehouse ETL technologies.
  • Have 4+ years of comprehensive experience in Big Data processing using Hadoop and its ecosystem (MapReduce, Pig, Hive, Sqoop, Flume, Spark).
  • Experience in AWS EC2, configuring the servers for Auto scaling and Elastic load balancing.
  • Good working experience on Spark (spark streaming, spark SQL), Scala and Kafka.
  • Good working knowledge on Snowflake and Teradata databases.
  • Provided and constructed solutions for complex data issues.
  • Experience in development and design of various scalable systems usingHadooptechnologies in various environments. Extensive experience in analyzing data using Hadoop Ecosystems includingHDFS, MapReduce, Hive & PIG.
  • Experience in understanding the security requirements for Hadoop.
  • Excellent Programming skills at a higher level of abstraction using Scala and Python.
  • Hands on experience in developing SPARK applications using Spark tools like RDD transformations, Spark core, Spark Streaming and Spark SQL.
  • Strong experience and knowledge of real time data analytics using Spark Streaming.
  • Working knowledge of Amazon’s Elastic Cloud Compute (EC2) infrastructure for computational tasks and Simple Storage Service (S3) as Storage mechanism.
  • Worked on reading multiple data formats on HDFS using Scala.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs
  • Good experience in creating and designing data ingest pipelines using technologies such as Apache Storm - Kafka.
  • Experienced in working with in-memory processing framework like Spark Transformations, Spark SQL and Spark Streaming.
  • Good working experience on using Sqoop to import data into HDFS from RDBMS and vice-versa.
  • Experienced in implementing POC using Spark Sql libraries.
  • Improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, YARN.
  • Hands on experience in handling Hive tables using Spark SQL.
  • Efficient in writing MapReduce Programs and using Apache Hadoop API for analyzing the structured and unstructured data.
  • Expert in working with Hive data warehouse tool-creating tables, data distribution by implementing partitioning and bucketing, writing, and optimizing the HiveQL queries.
  • Debugging Pig and Hive scripts and optimizing MapReduce job and debugging Map reduce job.
  • Hands-on experience in managing and reviewing Hadoop logs.
  • Good knowledge about YARN configuration.
  • Extending Hive and Pig core functionality by writing custom UDFs.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
  • Hands on experience in configuring and working with Flume to load the data from multiple sources directly into HDFS.
  • Knowledge of job workflow scheduling and monitoring tools like Oozie (hive, pig) and Dag (lambada).
  • Developed various shell scripts and python scripts to address various production issues.
  • Developed and designed automation framework using Python and Shell scripting
  • Good Knowledge of data compression formats like Snappy, Avro.
  • Experience with all stages of the SDLC and Agile Development model right from the requirement gathering to Deployment and production support.
  • Involved in daily SCRUM meetings to discuss the development/progress and was active in making scrum meetings more productive.

TECHNICAL SKILLS

Databases: Oracle, SQL Server, MySQL, HBase, MongoDB, RedShift, DynamoDB and Elastic Cache

Data Visualization Tools: Cognos, Tableau

Analytics Tools: AWS Sage Maker, AWS Glue, AWS Athena, IAM, S3, EMR, EC2, Data brew, CloudFormation

Programming Languages: Python, Scala, Shell scripting, PL/SQL, Perl

Operating System: Linux, Unix, Windows

Integration Tools: Git, Bitbucket, bamboo, ant, Maven

Big Data Ecosystem: HDFS, MapReduce, Pig, Hive, YARN, Impala, Sqoop, Flume, Oozie, Zookeeper, Spark, Kafka, Spark SQL

PROFESSIONAL EXPERIENCE

Confidential

Data Engineer

Responsibilities:

  • Understand the requirements and preparedarchitecturedocument for theBig Dataproject.
  • Imported and Exported Data from Different Relational Data Sources like DB2, SQL Server, Teradata to HDFS usingSqoop.
  • Migrated complex map reduce programs intoin memory Sparkprocessing using Transformations and actions.
  • Worked on creating theRDD's,DFs for the required input data and performed the data transformations using Spark Python.
  • Involved in developingSpark SQL queries, Data frames, import data from Data sources, perform transformations, perform read/write operations, save the results to output directory into HDFS.
  • WrittenHivejobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
  • DevelopedPIGscripts for the analysis of semi structured data.
  • DevelopedPIG UDF'Sfor manipulating the data according to Business Requirements and worked on developing custom PIG Loaders.
  • Worked onOozieworkflow engine for job scheduling.
  • DevelopedOozie workflowfor scheduling and orchestrating theETLprocess.
  • Experienced in managing and reviewing the Hadoop log files usingShell scripts.
  • Migrated ETL jobs toPig scriptsto do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
  • Worked on different file formats like Sequence files, XML files and Map files using MapReduce Programs.
  • Worked withAvroData Serialization system to work withJSONdata formats.
  • UsedAWS S3to store large amount of data in identical/similar repository.
  • Involved in build applications usingMavenand integrated with Continuous Integration servers likebambooto build jobs.
  • UsedEnterprise Data Warehousedatabase to store the information and to make it access all over organization.
  • Responsible for preparing technical specifications, analyzing functional Specs, development, and maintenance of code.
  • Worked with the Data Science team to gather requirements for various data mining projects
  • Written shell scripts for rolling day-to-day processes and it is automated.

Confidential

Data Engineer

Responsibilities:

  • Implemented and experienced in creating s3 buckets creation and IAM role la in non-prod region.
  • Implemented spark transformations and ingestion into our data lake.
  • Hands on expedited in implementation and designing EMR clusters in nonprofit and prod region based on data ingestion sizes
  • Hands on experience on implementing spark jobs
  • Design and implement control m jobs for completing ingestion process
  • Design and implemented oozie jobs to run the ingestion
  • Designed ingestion cluster to perform on data transformation
  • Configured spark jobs for quick ingestion and added enough resources to handled 10TB data on daily basis.
  • Responsible for Account management, IAM Management and Cost management.
  • DesignedAWSCloud Formation templates to create VPC, subnets, NAT to ensure successful deployment of Web applications and database templates.
  • Creating S3 buckets also managing policies for S3 buckets and Utilized S3 bucket and Glacier for storage and backup onAWS.
  • Experience to manage IAM users by creating new users, giving them a limited access as per needs, assign roles and policies to specific user.
  • Created RDD’s inSpark technology.
  • Extracting data fromdata warehouse (Teradata) on to the SparkRDD’s
  • Experience onSpark with Scala/Python.
  • Implemented build and deploy plans from scratch.
  • Hands on experience on bitbucket and bamboo.

Confidential

Hadoop Developer

Responsibilities:

  • Installed and configured Hadoop MapReduce, HDFS
  • Importing and exporting data into HDFS and Hive using Sqoop.
  • Experienced in defining job flows. Experienced in managing and reviewing Hadoop log files.
  • Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
  • Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
  • Load and transform large sets of structured, semi structured, and unstructured data.
  • Responsible to manage data coming from different sources.
  • Supported Map Reduce Programs those are running on the cluster.
  • Involved in loading data from UNIX file system to HDFS.
  • Installed and configured Hive and written Hive UDFs.
  • Involved in creating Hive tables, loading with data, and writing Hive queries which will run internally in map reduce way.
  • Conducted functional, system, data, and regression testing.
  • Involved in Bug Review meetings and participated in weekly meetings with management team.

Confidential

Full Stack Developer

Responsibilities:

  • Gathered specifications for the library site from different departments and users of the services.
  • Assisted in proposing suitable UML class diagrams for the project.
  • Wrote SQL scripts to create and maintain the database, roles, users, tables, views, procedures, and triggers in Oracle
  • Designed and implemented the UI using HTML, JSP, JavaScript and Java.
  • Implemented Multi-threading functionality using Java Threading API
  • Extensively worked on IBM Web Sphere 6.0 while implementing the project.
  • Developed the UI screens using HTML5, DHTML, XML, Java Scripts, Ajax, jQuery custom- tags, JSTL DOM Layout and CSS3.
  • Building skills in the following technologies: WebLogic, Spring Batch, Spring, Java.
  • Used XML SAX parser to simulate xml file which has simulated test data.
  • Designed/developed Rest based service by construction URI, developed service using JAX-RS annotations and Jersey implementation.
  • Used Junit, Easy mock framework for unit testing of application and implemented Test Driven Development (TDD) methodology.
  • Developed integration techniques using the JMS along with Mule ESB to integrate different applications.
  • Used Oracle as backend database using Windows OS. Involved in development of Stored Procedures, Functions and Triggers.
  • Involved in creating single page applications using Angular JS components, directives and implemented custom directive as part of implementing reusable components.

We'd love your feedback!