We provide IT Staff Augmentation Services!

Data Engineer  Resume

3.00/5 (Submit Your Rating)

SUMMARY:

  • A Data Engineer with newly acquired skills, an insatiable intellectual curiosity, and the ability to mine hidden gems located within large sets of structured, semi - structured and unstructured data.
  • Able to leverage a heavy dose of mathematics and applied statistics with visualization and a healthy sense of exploration.
  • The skills which I got might be helpful for you and please look at my profile for at least 10 seconds.

TECHNICAL SKILLS:

Hadoop/Big Data Technologies: Spark-Scala, Kafka, Spark Streaming, Mlib, Sqoop, Hbase, HDFS, Map Reduce, Pig, Hive, Zeppelin(Distributions Data Bricks, Horton works and Cloudera)

Programming Languages and Scripting: Java (JDK 5/JDK 6), C/C++, Python, Scala, HTML, SQL

Operating Systems: UNIX, Windows, LINUX, Mac OS X

Application Servers: IBM Web sphere, Tomcat

Web technologies: JSP, Servlets, JDBC, Java Script, CSS

Databases: Oracle9g/10g & MySQL 4.x/5.x, Hbase on AWS-s3 and HDFS

Data Modelling: Erwin, Visual Studio

Development Methodologies: Agile Methodology -SCRUM, Hybrid.

PROFESSIONAL EXPERIENCE:

Confidential

Data Engineer

Responsibilities:

  • Designed and deployed a Spark cluster and different Big Data analytic tools including Spark, Kafka streaming, AWS and HBase with Cloudera Distribution.
  • Configured deployed and maintained multi-node Dev and Test Kafka
  • Integrated kafka with Streaming ETL and done some required ETL on it to extract the meaningful insights.
  • Developed application components interacting with Hbase.
  • Performed optimizations on Spark/Scala.
  • Used the Kafka producer app to publish clickstream events into the Kafka topic and later explored the data with sparkSQL
  • Processed raw data at scale including writing scripts, web scraping, calling APIs, write SQL queries, etc
  • Importing streaming logs and aggregating the data to HDFS and MYSQL through Kafka.
  • Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Pyspark, Spark-SQL, Data Frame, Pair RDD's and Spark YARN.
  • Implemented Machine learning algorithms to optimize electrode targeting and parameter settings for deep brain stimulation.
  • Developed custom Machine Learning (ML) algorithms in Scala and then made available for MLIB in Python via wrappers
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Imported data from different sources like HDFS, MYSQL and other sources through Sqoop and kafka to import streaming logs into Spark RDD
  • Performed visualization using SQL integrated with Zeppelin on different input data and created rich dashboards
  • Performed transformations, cleaning and filtering on imported data using Spark-SQL and loaded final data into HDFS and MYSQL database.
  • Involved in production support and enhancement development.

Environment: Hadoop, Spark, Pyspark, Spark-SQL, HDFS, MapReduce, Hive, Sqoop, Kafka, HBase, Oozie, Spark - Streaming/SQL, java, SQL Scripting, Linux Shell Scripting, Zeppelin.

Confidential

Hadoop developer

Responsibilities:

  • Developed different MapReduce applications on Hadoop.
  • Mining the location of users on social media sites in semi supervised environment on Hadoop cluster using Map Reduce.
  • Implementing single source shortest path on Hadoop cluster.
  • Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts.
  • Evaluated suitability of Hadoop and its ecosystem to the above project and implemented various proof of concept (POC) applications to eventually adopt them to benefit from the Big Data Hadoop initiative.
  • Estimated Software & Hardware requirements for the Name Node and Data Node & planning the cluster.
  • Participated in requirement gathering from the Experts and Business Partners and converting the requirements into technical specifications.
  • Extracted the needed data from the server into HDFS and Bulk Loaded the cleaned data into HBase.
  • Written the Map Reduce programs, Hive UDFs in Java where the functionality is too complex.
  • Involved in running Hadoop jobs for processing millions of records of text data.
  • Involved in loading data from LINUX file system to HDFS.
  • Prepared design documents and functional documents.
  • Based on the requirements, addition of extra nodes to the cluster to make it scalable.
  • Developed HIVE queries for the analysis, to categorize different items.
  • Assisted application teams in installing Hadoop updates, operating system, patches and version upgrades when required.
  • Designing and creating Hive external tables using shared meta-store instead of derby with partitioning, dynamic partitioning and buckets.
  • Given POC of FLUME to handle the real time log processing for attribution reports.
  • Maintained System integrity of all sub-components (primarily HDFS, MR, HBase, and Hive)..

Environment: Hadoop, HDFS, MapReduce, Yarn, Hive, PIG, Oozie, Sqoop, HBase, Flume, Linux, Shell scripting, Java, Eclipse, SQL.

Confidential

Hadoop developer

Responsibilities:

  • Requirement discussions, design the solution.
  • Estimated the Hadoop cluster requirements
  • Responsible for choosing the Hadoop components (hive, pig, map-reduce, Sqoop, flume etc)
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Hadoop cluster building and ingestion of data using Sqoop
  • Imported streaming logs to HDFS through Flume
  • Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS
  • Developed Use cases and Technical prototyping for implementing Hive,and Pig.
  • Worked in analyzing data using Hive, Pig and custom MapReduce programs in Java.
  • Implemented partitioning, dynamic partitions and buckets in HIVE
  • Installed and configured Hive, Sqoop, Flume, Oozie on the Hadoop cluster.
  • Involved in scheduling Oozie workflow engine to run multiple Hive and Pig jobs.
  • Tuned the Hadoop Clusters and Monitored for the memory management and for the Map Reduce jobs.
  • Responsible for Cluster maintenance, Adding and removing cluster nodes, Cluster Monitoring and Troubleshooting.
  • Developed a custom Framework capable of solving small files problem in Hadoop.
  • Deployed and administered 70 node Hadoop clusters. Administered two smaller clusters.

Environment: Map Reduce, HBase, HDFS, Hive, Pig, Java, SQL, Cloudera Manager, Sqoop, Flume, Oozie, Java (JDK 1.6), Eclipse.

Confidential

java developer

Responsibilities:

  • Involved in various stages of Enhancements in the Application by doing the required analysis, development, and testing.
  • Prepared the High and Low-level design document and Generating Digital Signature
  • For the registration and validation of the enrolling customer developed logic and code.
  • Developed web-based user interfaces using J2EE Technologies.
  • Handled Client-Side Validations used JavaScript and
  • Used Validation Framework for Server-side Validations
  • Created test cases for the Unit and Integration testing.
  • Front-end was integrated with Oracle database using JDBC API through JDBC-ODBC Bridge driver at server side.

Environment: Java Servlets, JSP, JavaScript, XML, HTML, UML, Apache Tomcat, Eclipse, JDBC, Oracle 10g.

We'd love your feedback!