We provide IT Staff Augmentation Services!

Data Engineer Resume

4.00/5 (Submit Your Rating)

SUMMARY

  • Over 11 + years IT of experience.
  • 4+ years of experience in Big Data Technologies including Hadoop 2.6.0 - Cloudera 5.10 and Hortonworks well versed in SPARK, Python, HIVE, Impala, HBASE and Sqoop.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems (RDBMS) and vice-versa.
  • Having good knowledge on Python to execute Spark.
  • Having good knowledge in AWS Cloud systems.
  • In depth understanding of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce.
  • Expert in working with Hive data warehouse creating tables, data distribution by implementing Partitioning and Bucketing.
  • Work with technical and end users to understand business requirements and identify data solutions.
  • Responsible for gathering the requirements doing the analysis and formulating the requirements specifications with the consistent inputs/requirements.
  • In-depth knowledge of agile iterative Software Development Life Cycle process.
  • Develop database migration strategies, including schemas and migrating different kind of DB source system to Hadoop Data lake, Automate migration processes.

TECHNICAL SKILLS

Operating Systems: Windows, Linux, UnixBig Data Technologies: Hadoop Cloudera 2.6.0-cdh5.10, Hortonworks, Hue, Ambari, HIVE, Impala, HBase, Sqoop, SPARK 2.3.4, Py-Spark, Java UDF.

Databases & Tools: ORACLE, MS SQL Server, MySQL

Programming Languages: Python 3.7, Java 8, VB, VB.net

Scripting Languages: UNIX Shell Scripting

Development IDE / Tools: Zeppelin Notebook, Eclipse, VSCode

Source Code Control: Git, PVCS, SVN

PROFESSIONAL EXPERIENCE

Data Engineer

Confidential

Responsibilities:

  • Developed Sqoop scripts for initial loading of data
  • Wrote Shell scripts for data extracting from Adobe and injecting into HDFS.
  • Developed Spark transformations of Data
  • Developed Python scripts using Spark and Machine Learning libraries.
  • Actively participated in requirement gathering and analyzing business requirement .

Technologies: Hortonworks on RHEL 7.4, Hive, Spark 2.3.4, Python, Zeppelin Note Book, Unix Shell Scripting, Teradata, Oracle, Adobe Click Stream, and Data modeling.

Data Engineer

Confidential

Responsibilities:

  • Developed Sqoop with Unix scripts to load the data from SQL Server into HDFS
  • Developed Hive queries (Hql) for Data mapping and validation.
  • Developed customized application using Unix bash scripts for the TSV files ingestion
  • Wrote Shell scripts for data loading and injecting into HDFS.
  • Understand the bottlenecks and dependencies of existing systems
  • Responsible for direct interaction with client to gather the requirements.
  • Performed requirement analysis and design in building the data lake.
  • Work with business and Architect to understand the impact and streamline of the requirement.
  • Worked as an interface between the business and technical team to get solutions to identify and solve the issues.
  • Actively participated in requirement gathering and analyzing business requirement .

Technologies: Cloudera 5.8.3, Hive, Impala, Java UDF, Unix Shell scripts, Py-Spark, SQL Server.

We'd love your feedback!