Lead Hadoop Engineer/techlead Resume
Beaverton, OR
SUMMARY
- Over 15 years of multi - disciplined technology expertise in developing and delivering high performance, scalable, reliable and available systems dat work on large amount of data.
- Experienced in architecting and implementing BIG Data analytics solutions using distributed technologies such as Spark, Hadoop HDFS, Map Reduce, Hive, Pig, and oozie.
- Hands on Java,Scala,Python,C++ programming languges.
- Experienced in leading full life cycle planning, delivery of projects and provide hands-on technical leadership to engineering teams
- Experienced in building, managing scrum/agile, globally distributed and functionally interdependent teams
- Experienced in online & email data driven campaigns wif highly personalized/dynamic data and targeted messages
- Sun Certified Java Programmer & Enterprise Architect and Cloudera Certified Hadoop Developer
- Architected, designed and developed core Java based applications for financial trading industry and also built JEE based applications
- Built high volume, high performance, low latency, scalable, multi threaded, fault tolerant systems
- Excellent Object Oriented design, programming & problem solving skills; employed these skills to build highly extensible and maintainable systems
- Won excellent performance award from Confidential Group,GE Capital
- JVM fine-tuning.
- Experienced in Test Driven Environment, Continuous Integration.
- A self-motivated professional and natural communicator possessing good technical, initiating, leadership and problem-solving skills and TEMPhas proven to be a good team player.
- High personal and professional ethics and an effective team player wif an aptitude to learn.
- Migration on-perm environment to cloud based environment
- Experienced in Test Automation, Build Automation and Continuous Integration.
- Ability to quickly ramp up and start producing results on any given tool or technology.
- Excellent communication skills and understanding of business processes.
TECHNICAL SKILLS
Big Data Technologies: Hadoop,Hive,Pig, Java MapReduce, Apache Spark, Machine Learning,, Crunch, Cascading,Impala, Sqoop, Python streaming, EMR.Hive,Pig, Sqoop. Impala, Spark SQL, Sqark Streaming, Kafka Broker, Spark ML, Twitter elephant bird, DataFuLib.
Hadoop Distributions: Cloudera CDH5.X 4,X, Horton Works, Amazon EC2.
Languages: Java, Python, Scala, C, C++, Ruby, Javascript, UML
AWS: EMR,EC2,SQS,SNS,Lamda,Machine Learning.
NoSQL: Hbase, Cassandra, Dynamo DB. Simple DB.
Languages: Java, Python, Scala, C, C++
Methodology: WaterFall, Scrum, Agile.
ORM technology: Hibernate.JPA
App/Web servers: Weblogic, Tomcat, Websphere. Apache
Databases: Oracle,MySql
Operating Systems: Linux, Ubantu, Mac OSx, Windows.
Tools: Maven, ANT, JUNIT, log4J.
IDEs: Eclipse, Intellj.Toad
Scripting Languages: HTML, DHTML, Java Script.
Web services: REST, SOAP
Learning: Advance Python,Scala programming
Functional Programming: Akka concurrent, distributed, resilient, message-driven applications Building scalable system using AWS.
PROFESSIONAL EXPERIENCE
Lead Hadoop Engineer/TechLead
Confidential, Beaverton, OR
Responsibilities:
- Provide technical leadership to teh Big Data development team.
- Design and develop data ingestion, aggregation, integration and advanced analytics in Hadoop, spark using Java, Scala and Python
- Built continuous integration and test driven development environment
- Researched and deployed new tools, frameworks and patterns to build a sustainable big data platform
- Built business aggregation reports by joining large datasets (26 TB clickstream wif other dataset and other product, customer data) using scala spark 1.3.0 in EMR cluster
- Involved Crunch, Cascading data pipeline and convert them into Hive, Pig data pipe lines.
- In-depth experience wif Avro, HiveRC,Parquet,Sequence,Google ProtoBuf for binary data storage.
- Provided technical architecture and development leadership in implementing Hadoop and Spark in AWS on EC2 nodes.
- Experience wif Hadoop tools including Hive, Sqoop, Pig, Cascading, Crunch, and Impala.
- Experience wif both Job tracker (MRv1), Yarn and job tuning and optimizing job processing.
- Developed java udf,udaf,udtf for hive process.
- Migrated on-perm hadoop job to AWS.
- Migrated Long running hadoop jobs to EMR
- Lead most of development activities.
- Developed java, python user defined functions for Pig processes.
- Developed Spark streaming job, which subscribe from Kafka, message broker and ingested teh data in hadoop cluster.
- Developed Pig processes for ETL data pipeline.
- Created multiple Spark prototypes and proof of concepts.
- Involved Cluster migration.
- Designed and Lead development for Job auditing process.
- Worked wif Data scientists to create lot of POCs
- Implemented Test Automation, Build Automation
- Fine Tuned Hive Query, Debug Production issues for MR jobs.
- Design, configuration, and troubleshooting of teh Big data Hadoop solution in customer environment
- Productionalizing Hadoop applications (administration, configuration management monitoring, debugging, and performance tuning)
Confidential, Chicago
Responsibilities:
- Architected, Developed Big data solutions for building various credit reports.
- Developed Hive, Pig for building data pipelines.
- Created Hive UDF, Hive UDAF .
- Used Python UDF in pig .
- Created Custom java MR jobs which uses custom libs and parsing JSON, XML payloads.
- Used Avro, Google ProtoBuf, Sequence File, RC files
- Designed HBase schema for storing market data in time series format.
- Fine-tuned customer hive queries.
- Debugged production jobs.
- Build Custom Ingestor, which subscribes various messages from brokers and feed teh data into hadoop cluster.
- Fine-tuned JVM for reducer memory requirement.
- Development Audit and Logging process using java, python.
- Sqoop export/import various tables from customer, credit databases.
- Involved Development in Bond tranding application.
- Used Spring Integration module for integration messaging systems.
- Implemented Continuous integration using GIT,Hudson.Maven
- Automated various test cases.
Environment: Java 1.6, Hadoop 0.20.2,HBase, Hive, HDFS, Sqoop, Map Reduce Programming, Pig, CDH 3.X, Tibco RV, JMS, Spring, Hudson, Maven, Oracle, SQLServer,DB2.
Senior Consultant
Confidential, Chicago
Responsibilities:
- Architected and Designed teh systems.
- Lead 4 Engineers to develop teh system.
- Used Javaascript, AJAX, JQuery for web page development.
- Designed Communication layer framework using java,Tibco,python.
- Implemented J2EE pluggable autantication
- Used Hibernate for ORM Object to Database mapping framework.
Environment: java, Spring Framework, Javascript, AJAX, Oracle, Python, Rule Engine.
Confidential
Responsibilities:
- Implemented Big data for analysis certification logs and find business insights.
- Developed Java web application for certifying trading applications.
- Developed Ajax (used web 2.0) pages to refresh automated status about order and market data messages
- Developed scalable, high tolerance custom rule engine.
- Used Hive, java map reduce for analysis certification logs.
- Sqoop reports to oracle database,
Confidential
Responsibilities:
- Developed Web application for trading admistrators.
- Used Apache Struts for MVC Architecture.
- Used EJB as session façade layer.
- Used Tibco as communication layer to talk to various components.
- Build Router to route message to multiple engines.
Environment: Weblogic Application Server 9.0, J2EE (EJB, JMS, JNDI, JSP, Servlet), TIBCO RV, Sonic MQ, log4j, Sun OS, Swing, AJAX, JavaScript, CSS, HTML, DHTML, Eclipse, Oracle, TogetherSoft, ANT, CVS