Hadoop Developer Resume
SUMMARY
A Data Engineer with newly acquired skills, an insatiable intellectual curiosity, and the ability to mine hidden gems located within large sets of structured, semi - structured and unstructured data. Able to leverage a heavy dose of mathematics and applied statistics with visualization and a healthy sense of exploration. The skills which I got might be helpful for you and please look at my profile for at least 10 seconds.
TECHNICAL SKILLSHadoop/Big Data Technologies: Spark-Scala, Kafka, Spark Streaming, Mlib, Sqoop, Hbase, HDFS, Map Reduce, Pig, Hive, Zeppelin
(Distributions: Data Bricks, Horton works and Cloudera)
Programming Languages and Scripting: Java (JDK 5/JDK 6), C/C++, Python, Scala, HTML, SQL
Operating Systems : UNIX, Windows, LINUX, Mac OS X
Application Servers : IBM Web sphere, Tomcat
Web technologies : JSP, Servlets, JDBC, Java Script, CSS,
Databases : Oracle9g/10g & MySQL 4.x/5.x, Hbaseon AWS-s3.
Development and BI Tools : TOAD Visio, Rational Rose, Endur, Informatica 9.1.
Data Modelling : Erwin, Visual Studio
Development Methodologies : Agile Methodology -SCRUM, Hybrid.
PROFESSIONAL SUMMARY
- Around 5+ years of experience in Analysis, Architecture, Design, Development, Testing, Maintenance and User training of software application which includes over 4+ Years in Big Data, Spark, Hadoop and HDFS environment and around 1 Year of experience in JAVA/J2EE.
- 3 years hands on experience on Hadoop (HDFS, Map Reduce, PIG, HIVE, and SQOOP etc.).
- 1+ year’s hands on experience on Spark (1.5, 1.6) & Scala Full stack developer.
- HDPCD Certified Spark Developer
- Experience in designing and Testing solutions with the Microsoft Azure including HDInsight.
- Experience developing Machine Learning Algorithms using Azure ML Studio.
- Migration from different databases (i.e. Oracle, DB2 and MYSQL) to Hadoop and Spark with NoSQL databases.
- NoSQL database experience with HBase, Cassandra.
- Experience using Sqoop to import data into HDFS from RDBMS and vice-versa.
- Expertise in Unix-based operating systems
- Expertise in developing Spark programs that includes data processing, data management etc.
- Experience creating real-time data streaming solutions using Apache Spark core, Spark SQL, Kafka and spark streaming.
- Hands on Experience in Data Modelling (Erwin, Visual studio), Data Analysis, Data cleansing and Entity Relationship diagrams (ERD).
- Experience in metadata maintenance and enhancing existing Logical and Physical data models.
- Extensive experience with ETL and Query with big data tool like Hive QL.
- New data science and Machine learning skills to derive actionable insights in industry and beyond.
EXPERIENCE
Data Engineer
ConfidentialResponsibilities:
- Designed and deployed a Spark cluster and different Big Data analytic tools including Spark, Kafka-ETL streaming, HBase, zeppelin and SQOOP with Cloudera Distribution.
- Configured deployed and maintained multi-node Dev and Test Kafka
- Integrated kafka with Streaming ETL and done some required ETL on it to extract the meaningful insights.
- Developed application components interacting with Hbase.
- Performed optimizations on Spark/Scala.
- Used the Kafka producer app to publish clickstream events into the Kafka topic and later explored the data with sparkSQL
- Processed raw data at scale including writing scripts, web scraping, calling APIs, write SQL queries, etc
- Importing streaming logs and aggregating the data to HDFS and MYSQL through Kafka.
- Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Pyspark, Spark-SQL, Data Frame, Pair RDD's and Spark YARN.
- Implemented Machine learning algorithms to optimize electrode targeting and parameter settings for deep brain stimulation.
- Developed custom Machine Learning (ML) algorithms in Scala and then made available for MLIB in Python via wrappers
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
- Imported data from different sources like HDFS, MYSQL and other sources through Sqoop and kafka to import streaming logs into Spark RDD
- Performed visualization using SQL integrated with Zeppelin on different input data and created rich dashboards
- Performed transformations, cleaning and filtering on imported data using Spark-SQL and loaded final data into HDFS and MYSQL database.
Environment: Hadoop, Spark, Pyspark, Spark-SQL, HDFS, MapReduce, Hive, Sqoop, Kafka, HBase, Oozie, Spark - Streaming/SQL, java, SQL Scripting, Linux Shell Scripting, Zeppelin.
Hadoop developer
ConfidentialResponsibilities:
- Requirement discussions, design the solution.
- Estimated the Hadoop cluster requirements
- Responsible for choosing the Hadoop components (hive, Impala, map-reduce, Sqoop, flume etc)
- Responsible for building scalable distributed data solutions using Hadoop.
- Hadoop cluster building and ingestion of data using Sqoop
- Imported streaming logs to HDFS through Flume
- Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS
- Developed Use cases and Technical prototyping for implementing Hive,and Pig.
- Worked in analyzing data using Hive, Pig and custom MapReduce programs in Java.
- Implemented partitioning, dynamic partitions and buckets in HIVE
- Installed and configured Hive, Sqoop, Flume, Oozie on the Hadoop cluster.
- Involved in scheduling Oozie workflow engine to run multiple Hive and Pig jobs.
- Tuned the Hadoop Clusters and Monitored for the memory management and for the Map Reduce jobs.
- Responsible for Cluster maintenance, Adding and removing cluster nodes, Cluster Monitoring and Troubleshooting.
- Developed a custom Framework capable of solving small files problem in Hadoop.
- Deployed and administered 70 node Hadoop clusters. Administered two smaller clusters.
Environment: Map Reduce, HBase, HDFS, Hive, Pig, Java, SQL, Cloudera Manager, Sqoop, Flume, Oozie, Java (JDK 1.6), Eclipse
java developer
ConfidentialResponsibilities:
- Involved in various stages of Enhancements in the Application by doing the required analysis, development, and testing.
- Prepared the High and Low-level design document and Generating Digital Signature
- For the registration and validation of the enrolling customer developed logic and code.
- Developed web-based user interfaces using J2EE Technologies.
- Handled Client-Side Validations used JavaScript and
- Used Validation Framework for Server-side Validations
- Created test cases for the Unit and Integration testing.
- Front-end was integrated with Oracle database using JDBC API through JDBC-ODBC Bridge driver at server side.
Environment: Java Servlets, JSP, JavaScript, XML, HTML, UML, Apache Tomcat, Eclipse, JDBC, Oracle 10g.