We provide IT Staff Augmentation Services!

Hadoop Technical Lead Resume

Wilmington -, DE

PROFESSIONAL SUMMARY:

  • Over 8+ years of professional IT experience, 4+ years in Big Data Ecosystem experience in ingestion, querying, processing and analysis of big data.
  • Qualified and certified Hadoop and Java Developer with good experience in Analytics, Design and Development and Maintaining applications for various Business enterprises.
  • Statistical Analysis expert - All distributions, hypothesis testing, F-test, Chi-square testing, regression and correlation, decision tree, Probability and Operations Research.
  • Certified AWS Certified Solution Architect and Configured Elastic Load Balancers (ELB) with EC2Auto-scalinggroups.
  • ImplementedAWSsolutions using EC2, S3, EBS, Elastic Load Balancer andAuto-scaling groups.
  • Expertise on Hadoop ecosystem as Hadoop Architect.
  • Expertise on Python, 'R', Tableau, LAMPstack and Java and working noledge of Tensor Flow.
  • At JDSU worked as Oracle BI Supply Chain lead for implementing AdvancedSupply Chain Analytics, plus Financials and Sales Analytics implementation involving OBIEE, OBIA & Hyperion.
  • Extensive R&D experience in Machine Learning, Predictive Analytics, Modeling
  • Experience in Natural Language Processing (NLP), Semantic Analytics and Deep Learning
  • Experience in buildingDataScience teams and fostering its adoption within organizations
  • Excellent understanding / noledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
  • Experience with configuration ofHadoopEcosystem components: Hive, HBase, Pig, Sqoop,Mahout, Zookeeper, Flume, Storm, Spark
  • Hands on experience in installing and configuring Hadoop ecosystem components like Oozie, Hive, Sqoop, Zookeeper, Pig, and Flume.
  • Good Exposure on Map Reduce programming(JAVA), Hive, Pigscripting, Spark SQL(Scala/python).
  • Experience in writing custom UDFs in java for Hive and Pig to extend the functionality.
  • Experience in writing MAPREDUCE programs in java for data cleansing and preprocessing.
  • Hands On experience on Spark, SparkStreaming,SparkMlib, SCALA.
  • Creating the Data Frames handle inSparkwith Scala.
  • Hands On experience on developing UDF, DATAFrames and SQL Queries inSparkSQL.
  • Experience in building Data pipelines using Kafka and Spark.
  • Experience in managing and reviewing Hadoop log files.
  • Hands on experience in Import/Export of data using Hadoop Data Management tool Sqoop.
  • Experience with distributed systems, large-scale non-relational data stores, RDBMS, NoSQL, map-reduce systems, data modeling, database performance, and multi-terabyte data warehouses.
  • Experience in designing, developing and implementing connectivity products dat allow efficient exchange of data between our core database engine and the Hadoop ecosystem.Worked extensively on building Rapid development Framework using CoreJava.
  • Extensive experience and actively involved in Requirement gathering, Analysis, Design, Reviews, Coding, Code Reviews, Unit and Integration Testing.
  • Extensive experience in designing front end interfaces using HTML, JSP, CSS, Java Script and Ajax.
  • Very familiar withdataarchitecture,Hadoopinformation architecture,datamodelingand
  • datamining, machine learning and advanceddataprocessing.
  • Good Experience using Object Relational Mapping tool like Hibernate.
  • Experience in Spring Framework such as Spring IOC, Spring Resources, Spring JDBC.
  • Experience with various IDEs like IntelliJ, Eclipse, JBuilder and Velocity Studio.
  • Implemented the service projects on Agile Methodology and involved in running the scrum meetings.
  • Implemented the core product projects on Lean and Kanban Methodology and involved in delivering high quality health care product.
  • Experience in developing web-services using REST, SOAP, WSDL and Apache AXIS2.
  • Experience in writing the SQL queries.
  • Experience in designing and developing UI Screens using Java Server Pages, Html, CSS and JavaScript.
  • Set upSolrfor distributing indexing and search.
  • Wrote Java code to format XML documents; upload them toSolr server for indexing.
  • Loaded and accessed the process event failures messages through kafka-Solr-writer and querying onSolrcollection database.
  • Created theSolr collection in order to load the reports from the processed data. So developedSolr writer to write the encoded data toSolr collection from HDFS.
  • Familiar with data mining tools like ApacheMahoutand WEKA tools.
  • Used CVS, Maven, and SVN for Source code version control.
  • Experience in designing transaction processing systems deployed on various application servers including Tomcat, Web Sphere, Web logic.
  • Good Experience on Quality Control, JIRA, Fish Eye for tracking the tickets like accepting the tickets/defects, Submitting the tickets, Reviewing Code and closing the tickets etc.,
  • Designed dynamic user interfaces using AJAX and JQuery to retrieve data without reloading the page and send asynchronous request.
  • Excellent Experience in Code Refactoring.
  • Excellent Client interaction skills and proven experience in working independently as well as in a team.
  • Excellent communication, analytical, interpersonal and presentation skills.

TECHNICAL SKILLS

Hadoop Ecosystem: Kafka,HDFS,MapReduce,Hive,Impala,Pig,Sqoop,Flume,Oozie,Zookeeper,Ambari,Hue,Spark,Strom,Ganglia

Hadoop Platforms: Cloudera, hortonworks, MapR

Web Technologies: JDBC, Servlets, JSP, JSTL, JNDI, XML, HTML, CSS and AJAX

NoSQL Databases: HBase, Cassandra, MongoDB

Databases: Oracle 8i/9i/10g, MySQL

Languages: Python, Java, SQL, R, PL/SQL, Ruby, Shell Scripting

Operating Systems: UNIX(OSX, Solaris), Windows, Linux(Cent OS, Fedora, Red Hat)

Frame Works: Struts, Hibernate, Spring, ConceptWave, ATG 7.0

Application Server: Apache Tomcat

Streaming technologies: Flume, Storm, Spark Streaming

Analytics: Spark,Mahout

Search: Elasticsearch, Solr

Application Server: Apache Tomcat

Project Management / Tools / Applications: All MS Office suites(incl. 2003), MS Exchange & Outlook, Lotus Domino Notes, Citrix Client, SharePoint, MS Internet Explorer, Firefox, Chrome, Apache, IIS

PROFESSIONAL EXPERIENCE

Confidential, Wilmington - DE

Hadoop Technical Lead

Responsibilities:

  • Developed Data Lake on Cloudera platform which gives complete insights on oncology data to all the key businesses responsible for the new product launch.
  • Worked on ingesting data from multiple vendors with variety of data into Cloudera data lake (EDH) hosted on AWS
  • Created ingestion framework using python and Hadoop to ingest data from different sources to HDFS.
  • Worked on maintaining patient's entire clinical journey, derived valuable insights dat drives the business with exponential growth.
  • Worked on optimizing impala using shuffle & broadcast techniques and hive with serde format techniques.
  • Onsite anchor and a bridge between business and technology development done by the offshore team.
  • Worked on transformation, de-normalization and mashing of huge amount of Oncology Data on Terabyte scale optimized for deriving insights and visualizations.
  • Creating end to end Spark-Solr applications usingScalato perform various data cleansing, validation, transformation and summarization activities according to the requirement.
  • Worked on creating ETL work flow using core java, Shell script and HQL.
  • Importing and exporting the data using Sqoop from HDFS to Relational Database systems and vice-versa.
  • Created business critical Analytics service layer data model and implemented the same using Hive and Impala.
  • Developed business critical java UDFs in Hive as needed for complex Querying.
  • Applying partitioning and bucketing techniques in Hive for performance improvement.
  • Working on integrating spark streaming with Apache Kafka to bring in real time analytics.
  • Used Oozie and Autosys for workflow Management of batch jobs.
  • Worked on optimizations with dynamic ingestion and schema evolution using Spark & Avro.
  • Worked with the Production support team to resolve production Issues.

Environment: HDFS, Impala, Hive, AWS, Cloudera,Scala2.10.5, Python, Kafka, Streamsets, InfluxDB, Sensu, Rabbitmq, kabana, Pig, Hbase, Sqoop, Kibana, Oozie, Autosys, bash.

Confidential - Dallas, TX

Hadoop Lead Developer

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop.
  • Involved in loading data from LINUX file system to Hadoop Distributed File System.
  • Created Hbase tables to store various data formats of PII data coming from different portfolios.
  • Experience in managing and reviewing Hadoop log files.
  • Creating instances in openstack for setting up the environment.
  • Setting up the ELK( ElatsticSearch, Logstash, Kibana) Cluster.
  • Trouble shooting any Nova, Glance issue in openstack, Kafka, Rabbitmq bus.
  • Performance testing of the environment- Creating python script to load on IO, CPU.
  • Experience with OpenStack Cloud Platform.
  • Experienced in Provisioning Hosts with flavors GP(General-purpose), SO(Storage Optimize), MO(Memory Optimize), CO(Compute Optimize).
  • Performance testing of the environment- Creating pythonscript to load on IO, CPU.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Involved in Design, development, implementation and documentation in various big data technologies.
  • Django Framework used in developing web applications to implement the MVC architecture
  • Used Django APIs for database access
  • Design and Development of adapters to inject and eject data from various data source to/from Kafka.
  • Design and development of HBase tables according to various needs of the tenants while taking into consideration various issues related to performance.
  • Documenting the process of designing and developing the HBase tables.
  • Interact with various business teams to document the requirements for HBase tables.
  • Developed Spark applications to move data into HBase tables from various sources like RelationalDatabase or Hive.
  • Managed and halped a team of two developers with Spark programming.
  • Design, Development and Documentation of various sqoop scripts to pull the data into Hadoop eco system.
  • Involved in creating Hive tables, and loading and analyzing data using hive queries.
  • Optimizing of existing algorithms inHadoopusingSparkContext,Spark-Sql, DataFrames and PairRDD's.
  • Worked on Cluster of size 135 nodes.
  • ImplementedSpark using Scala and utilizing Data frames andSpark SQL API for faster processing of data
  • Worked on migrating MapReduce programs into Sparktransformations usingScala.
  • Worked on reading multipledata formats on HDFS usingScala
  • Used Spark to create API's inScala for Big data analysis.
  • Created RDD's, DataFrames and Datasets.
  • Good experience with TalendOpenStudiofor designing ETL Jobs for Processing of data.
  • Used ORC, Parquet file formats for storing the data.
  • Used java code for SqlQueries and also code to retrieve the Sql Queries through Text File.
  • Used Eclipse for the Development, Testing and Debugging of the application.
  • Use python for writing script to move the data cluster to cluster.
  • Log4j framework has been used for logging debug, info & error data.
  • Created HiveExternal and Managedtables.
  • Use of MAVEN for dependency management and structure of the project.
  • Designed and Maintained Tez workflows to manage the flow of jobs in the cluster
  • Loaded theSpark RDD and do in memory data Computation to generate the Output response.

Environment: Spark RDD, Spark Sql, Spark Data Frames, Maven Eclipse, ElasticSearch, Logstash, Ansible, Rhel7, python, Kafka, streamsets, Influxdb, sensu, rabbitmq, Uchiwa, kibana, Hive,Pig,Hbase, Sqoop.

Confidential - Irving, TX

Hadoop Lead Developer

Responsibilities:

  • Implemented CDH3Hadoop cluster on CentOS.
  • Implemented POC's to configure data tax Cassandra with Hadoop.
  • Launching AmazonEC2Cloud Instances using Amazon Images (Linux/Ubuntu) and Configuring
  • Installed the application on AWS EC2 instances and configured the storage on S3 buckets.
  • Responsible for migrating the code base from Cloudera Platform to Amazon EMR and evaluated Amazon eco systems components like Redshift, Dynamo DB.
  • POC on Data Search using Elastic Search
  • Along with the Infrastructure team, involved in design and developedKafka and Storm based data pipeline. This pipeline is also involved in Amazon Web Services EMR, S3 and RDS.
  • Worked onMongoDBby using CRUD (Create, Read, Update and Delete), Indexing, Replication and Shardingfeatures.
  • Launched instances with respect to specific applications.
  • Developed and managed cloud VMs with AWS EC2 command line clients and management console.
  • Lead initiatives in developing cloud-based, SaaS solutions for design market.
  • Launching and Setup of HADOOP Cluster which includes configuring different components of HADOOP.
  • Import the data from different sources like HDFS/HBase into SparkRDD
  • Developed RDD's/Data Frames in Spark using Scala and Python and applied several transformation logics to load data from Hadoop Data Lake to Cassandra DB.
  • Involved in developing Map-reduce framework, writing queries scheduling map-reduce
  • Involved in Performance Optimization of Queries & Stored Procedures by analyzing Query Plans, blocking queries, Identifying missing indexes
  • Real time streaming of data using Spark with Kafka.
  • Created tables in HIVE and after dat load data from HDFS to HIVE
  • Hands on experience in loading data from UNIX file system to HDFS.
  • Experienced with Performing Cassandra Query operations using Thrift API to perform real time analytics.
  • Worked on custom Pig Loaders and Storage classes to work with a variety of data formats such as JSON, Compressed CSV, etc.
  • Cluster coordination services through Zookeeper.
  • Implemented Persistence and search of data using Solr.
  • Installed and configured Flume, Hive, Pig, Sqoop and Oozie on the Hadoop cluster.
  • Involved in creating Hive tables, loading data and running hive queries in those data.
  • Extensive Working noledge of partitioned table, UDFs, performance tuning, compression-related properties, thrift server in Hive.
  • Experience in developing Maven and ANT scripts to automate the compilation, deployment and testing of web application
  • Developed Apache Spark jobs using Scala in test environment for faster data processing and used SparkSQL for querying
  • Involved in writing optimized Pig Script along with involved in developing and testing Pig Latin Scripts.
  • Working noledge in writing Pig's Load and Store functions.

Environment: Apache Hadoop, MapReduce, Scala, HDFS, Python, CentOS, Zookeeper, Sqoop, Kafka, MemSQL, Cassandra, Redshift, Dynamo DB, Solr, Hive, Pig, Oozie, Spark SQL, Scala, GitHub, Json, Netezza, Cassandra, Cloudera CDH3, Oracle, Maven, Ant, Eclipse, Amazon EC2, MongoDB, EMR, S3.

Confidential -Raleigh, NC

Hadoop Developer

Responsibilities:

  • Installed and configured Pig and also written Pig Latin scripts.
  • Involved in managing and reviewing Hadoop Job tracker log files and control-m log files.
  • Scheduling and managing cron jobs, wrote shell scripts to generate alerts.
  • Monitoring and managing daily jobs, processing around 200k files per day and monitoring those through RabbitMQand Apache Dashboard application.
  • Used Control-m scheduling tool to schedule daily jobs.
  • Experience in administering and maintaining a Multi-rack Cassandra cluster
  • Monitored workload, job performance and capacity planning using InsightIQstorage performance monitoring and storage analytics, experienced in defining job flows.
  • Got good experience with NOSQL databases like Cassandra, Hbase.
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
  • Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to stream the log data from servers/sensors
  • Developed MapReduce programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
  • Used Hive data warehouse tool to analyze the unified historic data in HDFS to identify issues and behavioral patterns.
  • The Hive tables created as per requirement were internal or external tables defined with appropriate static and dynamic partitions, intended for efficiency.
  • Worked on setting up High Availability for GPHD 2.2 with Zookeeper and quorum journal nodes.
  • Used Control-m scheduling tool to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce, Hive and Sqoop as well as system specific jobs
  • Worked with BI teams in generating the reports and designing ETL workflows on Tableau.
  • Involved in Scrum calls, Grooming and Demo meeting, Very good experience with agile methodology.

Environment:Apache Hadoop 2.3, gphd-1.2, gphd-2.2, Map Reduce 2.3, HDFS, Hive, Java 1.6 & 1.7, Cassandra, Pig, SpringXD, Linux, Eclipse, RabbitMQ, Zookeeper, PostgresDB, Apache Solar, Control-M, Redis., Tableau, Qlikview, DataStax.

Confidential

Role: Java/J2EE Developer

Responsibilities:

  • Involved in Java, J2EE, struts, web services and Hibernate in a fast paced development environment.
  • Followed agile methodology, interacted directly with the client on the features, implemented optimal solutions, and tailor application to customer needs.
  • Involved in design and implementation of web tier using Servlets and JSP.
  • Used Apache POI for Excel files reading.
  • Developed the user interface using JSP and Java Script to view all online trading transactions.
  • Designed and developed Data Access Objects (DAO) to access the database.
  • Used DAO Factory and value object design patterns to organize and integrate the JAVA Objects
  • Coded Java Server Pages for the Dynamic front end content dat use Servlets and EJBs.
  • Coded HTML pages using CSS for static content generation with JavaScript for validations.
  • Used JDBC API to connect to the database and carry out database operations.
  • Used JSP and JSTL Tag Libraries for developing User Interface components.
  • Performing Code Reviews.
  • Performed unit testing, system testing and integration testing.
  • Involved in building and deployment of application in Linux environment.

Environment: Java, J2EE, JDBC, Struts, SQL. Hibernate, Eclipse, Apache POI, CSS.

Confidential

Java/J2EE Developer

Responsibilities:

  • Involved in Java, J2EE, struts, web services and Hibernate in a fast paced development environment.
  • Followed agile methodology, interacted directly with the client on the features, implemented optimal solutions, and tailor application to customer needs.
  • Involved in design and implementation of web tier using Servlets and JSP.
  • Used Apache POI for Excel files reading.
  • Developed the user interface using JSP and Java Script to view all online trading transactions.
  • Designed and developed Data Access Objects (DAO) to access the database.
  • Used DAO Factory and value object design patterns to organize and integrate the JAVA Objects
  • Coded Java Server Pages for the Dynamic front end content dat use Servlets and EJBs.
  • Coded HTML pages using CSS for static content generation with JavaScript for validations.
  • Used JDBC API to connect to the database and carry out database operations.
  • Used JSP and JSTL Tag Libraries for developing User Interface components.
  • Performing Code Reviews.
  • Performed unit testing, system testing and integration testing.
  • Involved in building and deployment of application in Linux environment.

Environment: Java, J2EE, JDBC, Struts, SQL. Hibernate, Eclipse, Apache POI, CSS.

Hire Now