Data Engineer Resume
SUMMARY
- Over 11 + years IT of experience.
- 4+ years of experience in Big Data Technologies including Hadoop 2.6.0 - Cloudera 5.10 and Hortonworks well versed in SPARK, Python, HIVE, Impala, HBASE and Sqoop.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems (RDBMS) and vice-versa.
- Having good knowledge on Python to execute Spark.
- Having good knowledge in AWS Cloud systems.
- In depth understanding of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce.
- Expert in working with Hive data warehouse creating tables, data distribution by implementing Partitioning and Bucketing.
- Work with technical and end users to understand business requirements and identify data solutions.
- Responsible for gathering the requirements doing the analysis and formulating the requirements specifications with the consistent inputs/requirements.
- In-depth knowledge of agile iterative Software Development Life Cycle process.
- Develop database migration strategies, including schemas and migrating different kind of DB source system to Hadoop Data lake, Automate migration processes.
TECHNICAL SKILLS
Operating Systems: Windows, Linux, UnixBig Data Technologies: Hadoop Cloudera 2.6.0-cdh5.10, Hortonworks, Hue, Ambari, HIVE, Impala, HBase, Sqoop, SPARK 2.3.4, Py-Spark, Java UDF.
Databases & Tools: ORACLE, MS SQL Server, MySQL
Programming Languages: Python 3.7, Java 8, VB, VB.net
Scripting Languages: UNIX Shell Scripting
Development IDE / Tools: Zeppelin Notebook, Eclipse, VSCode
Source Code Control: Git, PVCS, SVN
PROFESSIONAL EXPERIENCE
Data Engineer
Confidential
Responsibilities:
- Developed Sqoop scripts for initial loading of data
- Wrote Shell scripts for data extracting from Adobe and injecting into HDFS.
- Developed Spark transformations of Data
- Developed Python scripts using Spark and Machine Learning libraries.
- Actively participated in requirement gathering and analyzing business requirement .
Technologies: Hortonworks on RHEL 7.4, Hive, Spark 2.3.4, Python, Zeppelin Note Book, Unix Shell Scripting, Teradata, Oracle, Adobe Click Stream, and Data modeling.
Data Engineer
Confidential
Responsibilities:
- Developed Sqoop with Unix scripts to load the data from SQL Server into HDFS
- Developed Hive queries (Hql) for Data mapping and validation.
- Developed customized application using Unix bash scripts for the TSV files ingestion
- Wrote Shell scripts for data loading and injecting into HDFS.
- Understand the bottlenecks and dependencies of existing systems
- Responsible for direct interaction with client to gather the requirements.
- Performed requirement analysis and design in building the data lake.
- Work with business and Architect to understand the impact and streamline of the requirement.
- Worked as an interface between the business and technical team to get solutions to identify and solve the issues.
- Actively participated in requirement gathering and analyzing business requirement .
Technologies: Cloudera 5.8.3, Hive, Impala, Java UDF, Unix Shell scripts, Py-Spark, SQL Server.