Sr. Big Data/hadoop Developer Resume
Dublin, OH
SUMMARY
- Having 8+ years of professional experience in Hadoop and Java technologies like HDFS, MapReduce, Apache Pig, Hive, HBase, Sqoop, Oracle, JSP, JDBC and Spring.
- 4 years of experience in Client - Server and Web based application development using Java technologies Java, J2EE, JSP, Java Script, Servlets, springs.
- 6 years of working experience in Hadoop eco-system technologies like Apache Pig, Apache Hive, Apache Sqoop, Apache Flume and Apache HBase.
- Experience in analyzing data using Hive QL, Pig Latin and custom MapReduce programs in Java.
- Hands on experience in writing Pig UDFs, Hive UDFs and UDAFs in the analysis of data.
- Worked with NoSQL database like HBase.
- Experience in importing and exporting data from relational database into HDFS using Sqoop.
- Developed MapReduce jobs, used different optimization techniques to improve performance in Map Reduce Programs.
- Good knowledge in using job scheduling and monitoring tools like Oozie and Zoo Keeper
- Extensive experience in configuring Flume to stream data into HDFS.
- Experience in implementing Spark in Scala and Spark SQL for faster analyzing and processing of data.
- Excellent understanding of Hadoop architecture and its components like HDFS, MapReduce, Job Tracker, Task Tracker, Name Node and Data Node.
- Used Apache Impala to read, write and query the Hadoop data in HDFS, HBase and Cassandra.
- Hands on experience in Application Development using Java, Python, Hadoop, RDBMS and Linux shell scripting.
- Good knowledge in SOA/Web Services and APIs for building software applications.
- Knowledge on SOA functional, integration and regression testing.
- Hands on experience in BI tools Tableau.
- Excellent knowledge on the complete Cassandra architecture with in detail understanding of read writes and delete processes.
- Worked on large scale data migration in Cloud (AWS).
- Extensive experience in developing and deploying Java based applications.
- Involved in working on all phases of software development life cycle (SDLC) from requirements gathering to programming, testing and maintenance.
- Strong skills in designing, developing and testing of Client-Server products and distributed programming using Java, J2EE and Web related technologies.
- Extensively designed and executed SQL queries to ensure data integrity and consistency at the backend.
- Expertise in J2EE Application development using JSP, Servlets, JDBC, XML, Spring.
- Strong experience in handling different Web Servers like Tomcat, and Application Servers like Web logic, WebSphere and JBOSS.
- Experienced in GUI design. Extensively used HTML, XML, Java Script, and JSP.
- Involved in the development of Spring JDBC DAO Support for data base interactions.
- Involved in developing Static and Dynamic pages using JSP and Servlets.
- Setup Struts framework in Web logic Server, Tomcat.
- Involved in Software Development Lifecycle.
- Functional skills include, project management, Leadership skills, Quality control, L&D
- Good knowledge with PL/SQL stored procedures using Oracle.
- Ability to master new Concepts.
- Excellent problem-solving capabilities and communication skills.
TECHNICAL SKILLS
JAVA Technologies: Java, JDK 1.2, JDK 1.3, JDK1.4, JDK1.5, JDK1.6.
J2EE Technologies: JSP, Java Bean, Servlets, JDBC, JPA1.0, EJB3.0, JDBC, JNDI, JOLT, Amazon Cloud (S3, EC2, Elastic Beanstalk and RDS).
Languages: C, C++, PL/SQL, Python and Java.
Frame Works: Hadoop (HDFS, Map Reduce, Pig, Hive, HBase, Mahout, Falcon, Oozie, Accumulo, Zookeeper, YARN, Lucene) Struts 1.x and Spring 3.x
Web Technologies: XHTML, JavaScript, AngularJS, AJAX, HTML, XML, XSLT, XPATH, CSS, DOM, WSDL, GWT, JQuery, Perl, VB Script.
Application Servers: WebLogic8.1/9.1/10.x, Web-Sphere5.x/6.x/7.x, Tuxedo server 7.x/9.x, Glass Fish Server 2.x, JBoss4.x/5.x.
Web Servers: Apache Tomcat 4.0/ 5.5, Java Web Server 2.0.
Operating Systems: Windows-XP/2000/NT, UNIX, Linux, and DOS
Databases: SQL, PL/SQL, Oracle 9i/10g, MYSQL, Microsoft Access, SQLServer,No SQL (HBASE, MongoDB).
IDE: Eclipse3.x, My Eclipse 8.x, RAD 7.x and JDeveloper 10.x.
Web Technologies: XHTML, JavaScript, XML, CSS, DOM, WSDL, SOA, Web Services.
Platforms: Windows XP/NT/9x/2000, MS-DOS, UNIX /LINUX/Solaris/AIX
Distribution: Cloudera/Hortonworks
Version Control: Win CVS, VSS, PVCS, Subversion, GIT
PROFESSIONAL EXPERIENCE
Confidential, Dublin, OH
Sr. Big Data/Hadoop Developer
Responsibilities:
- Analyzed Business Requirements and Identified mapping documents required for system and functional testing efforts for all test scenarios.
- Implemented of Core concepts of Java, J2EE Technologies: JSP, Servlets, JSF, JSTL, EJB transaction implementation (CMP, BMP, and Message-Driven Beans), JMS, Struts, Spring, Swing, Hibernate, Java Beans, JDBC, XML, Web Services, JNDI, Multi-Threading, Drools, etc.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop
- Performed Requirement Gathering & Analysis by actively soliciting, analyzing and negotiating customer requirements and prepared the requirements specification document for the application using Microsoft Word.
- Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster
- Developed Use Case diagrams, business flow diagrams, Activity/State diagrams.
- Installed and configured HadoopMapReduce and HDFS.
- Adopted J2EE design patterns like Session Facade and Business Facade.
- Installed and configured Hive and implemented various business requirements by writing HIVE UDFs.
- Configuration of application using spring, Struts, Hibernate, DAO's, Actions Classes, Java Server Pages.
- Configuring Hibernate Struts and Tiles related XML files.
- Developed the application using Struts Framework that uses Model View Controller (MVC) architecture with JSP as the view.
- Developed presentation layer using JSF, JSP, HTML and CSS, JQuery.
- Extensively used Spring IOC for Dependency Injection and worked on Custom MVC Frameworks loosely based on Struts.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required
- Extensively worked on user interface for few modules using HTML, JSP's, JavaScript, and Python.
- Generated Business Logic using servlets, Session beans and deployed them on Web logic server.
- Created complex SQL queries and stored procedures.
- Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms
- Developed the XML schema and Web services for the data support and structures.
- Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts
- Used different file formats like Text files, Sequence Files, Avro Used Zookeeper to manage coordination among the clusters
- Created and maintained Technical documentation for launching Hadoop Clusters and for executing Hive queries and Pig Scripts
- Used IMPALA to pull the data from Hive tables and developed the Frontend application with Angular JS.
- Involved in writing Python Scripts
- Deployed the applications on Web Sphere Application Server.
- Used Oracle10g database for tables creation and involved in writing SQL queries using Joins and Stored Procedures.
- Managed application deployment using Python.
- Configuring Sqoop and Exporting/Importing data into HDFS.
- Used Soap UI Pro for Testing Web services.
- Worked with configuration management groups for providing various deployment environments set up including System Integration testing, Quality Control testing etc.
Environment: J2EE, JDK, JSP, JSF, Scala, Python, Spark, MVC and Struts, Eclipse IDE, Hibernate, Hadoop, MapReduce, HBase, Hive, Pig, Sqoop, Impala, Zookeeper, SQL Developer, Oracle 10g, Angular JS, JavaScript, HTML5, CSS, SQL.
Confidential, Mc Lean, VA
Sr. Big Data/Hadoop Developer
Responsibilities:
- Handled importing of data from various data sources, performed data control checks using Spark and loaded data into HDFS .
- Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
- Creating end to end Spark-Solr applications using Scala to perform various data cleansing, validation, transformation and summarization activities according to the requirement.
- Developed Scala scripts using both Data frames/SQL/Data sets and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into OLTP system through Sqoop .
- Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark , Effective & efficient Joins, Transformations and other during ingestion process itself.
- Designed, developed and did maintenance of data integration programs in a Hadoop and RDBMS environment with both traditional and non-traditional source systems as well as RDBMS and NoSQL data stores for data access and analysis.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDD'S and Scala.
- Developed Spark scripts by using Scala shell commands as per the requirement.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in HDFS.
- Load the data into Spark RDD and performed in-memory data computation to generate the output response.
- Developed Unix shell scripts to load large number of files into HDFS from Linux File System.
- Involved in creating Hive tables, and loading and analyzing data using Hive queries
- Developed Hive queries to process the data and generate the data cubes for visualizing
- Implemented schema extraction for Parquet and Avro file Formats in Hive.
- Design and improve internal search engine using Big data and SOLR/Fusion
- Data migration from various data sources to SOLR via stages according to the requirement
- Used Akka as a framework to create reactive, distributed, parallel and resilient concurrent applications in Scala.
- Developed custom fields, custom Jira Plugins , and validations to implement complex workflows.
- Extensively worked on Jenkins for continuous integration and for End to End automation for all build and deployments.
- Involved in preparing JIL's for AutoSys jobs.
- Used Sqoop to import the data from RDBMS to Hadoop Distributed File System (HDFS ) and later analyzed the imported data using Hadoop Components.
- Extracted data from Oracle database transformed and loaded into Green Plum database according to the Business specifications.
- Created Mappings to move data from Oracle , SQL Server to new Data Warehouse in Green Plum.
- Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries.
- Pleasant experience with continuous Integration of application using Jenkins
- Migrated an existing on-premises application to AWS . Used AWS services like EC2 and S3 for small data sets.
- Responsible for developing data pipeline with Amazon AWS to extract the data from weblogs and store in HDFS .
- Used Cloud watch logs to move app logs to S3. Create alarms based on exceptions raised by applications.
- Used Oozie for automating the end to end data pipelines and Oozie coordinators for scheduling the work flows.
- Implemented daily workflow for extraction, processing and analysis of data with Oozie.
Environment: Java, Scala, Apache Spark, Apache Zeppelin, Green Plum 4.3 (PostgreSQL), Spring, Maven, Hive, HDFS, YARN, MapReduce, Sqoop, Flume, SOLR, JIRA, UNIX Shell Scripting, Python, AWS, Kafka, Jenkins, Akka.
Confidential, Irving, TX
Hadoop/Big Data Developer
Responsibilities:
- Extensively implemented various Big Data strategies in all stages of SDLC by following Agile.
- Developed Pig Scripts for validating and cleansing the data.
- Developed MapReduce programs to pharse the raw data and stored the refined data in HBase.
- Created Hive queries for extracting data from Cornerstone (Data Lake) to HDFS locations.
- Managed and reviewed Hadoop logs.
- Search trends (Google Trends) processing implemented using HPCC in Amazon AWS .
- Tested raw data and executed performance scripts.
- Shared responsibility for administration of Hadoop, Hive and Pig.
- Exported data from HDFS to RDBMS for visualization and user report generation using Tableau.
- Involved in the process of load, transform and analyze Transactions data from various providers into Hadoop on an on-going basis.
- Filtered, transformed and combined data which came from Cornerstone (Data Lake) based on business requirements using custom Pig Scripts and stored in Cognition (downstream DB)
- Responsible for design and creation of Test cases (in Rally) and tested the Tableau dashboards using Functional testing, system testing, Integration testing, regression testing and UAT.
- Participated and conducted Issue Log weekly status meetings, Report status meetings and Project status meetings to discuss issues and workarounds.
- Communicated with developers (On-shore and Off-shore) throughout all the phases of Development to eliminate Roadblocks.
- Generated daily progress report and represented in daily Agile Scrum meetings.
- Hands on experience on Apache Pig.
- Creating Unit Test plans and Test cases.
Environment: Apache Hadoop, PIG, Hive, SQL, Map Reduce, Core Java, Rally, MAPR, UNIX/LINUX, Tableau, Windows, MS Office, Microsoft Outlook.
Confidential, Chicago, IL
Hadoop/Big Data Developer
Responsibilities:
- Extracted and updated the data into HDFS using Sqoop import and export command line utility interface.
- Responsible for developing data pipeline using Flume, Sqoop, and Pig to extract the data from weblogs and store in HDFS.
- Involved in using HCATLOG to access Hive table metadata from MapReduce and Pig code.
- Involved in developing Hive UDFs for the needed functionality.
- Involved in creating Hive tables, loading with data and writing Hive queries.
- Managed works including indexing data, tuning relevance, developing custom tokenizes and filters, adding functionality includes playlist, custom sorting and regionalization with Solr search engine.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Used pig to do transformations, event joins, filter boot traffic and some pre-aggregations before storing the data onto HDFS.
- Implemented advanced procedures like text analytics and processing using the in-memory computing capabilities like spark.
- Enhanced and optimized product Spark code to aggregate, group and run data mining tasks using the Spark framework.
- Extending Hive and Pig core functionality by writing custom UDFs.
- Experience in managing and reviewing Hadoop log files
- Developed data pipeline using Flume, Sqoop, pig and java MapReduce to ingest customer behavioral data and financial histories into HDFS for analysis.
- Involved in emitting processed data from Hadoop to relational databases and external file systems using Sqoop.
- Orchestrated hundreds of Sqoop scripts, pig scripts, Hive queries using Oozie workflows and sub-workflows.
- Loaded cache data into HBase using Sqoop.
- Used Python scripts to schedule Oozie workflow at regular intervals.
- Experience in custom Talend jobs to ingest, entich and distribute data inMapR, Cloudera Hadoop ecosystem.
- Created lots of external tables on Hive pointed to HBase tables.
- Analyzed HBase data in Hive by creating external partitioned and bucketed tables.
- Worked with cache data stored in Cassandra.
- Injected the data from External and Internal Flow Organizations.
- Used the external tables in Impala for data analysis.
- Supported MapReduce Programs those are running on the cluster.
- Participated in apache Spark POCS for analyzing the sales data based on several business factors
- Participated in daily scrum meetings and iterative development.
Environment: Hadoop, MapReduce, HDFS, Pig, Hive, HBase, Impala, Sqoop, Flume, Oozie, Apache Spark, Java, Linux, SQL Server, Zookeeper, Autosys, Tableau, Cassandra.
Confidential, Minneapolis, MN
Sr. Hadoop Developer
Responsibilities:
- Installed and configured HadoopMapReduce , HDFS , Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
- Worked on moving all log files generated from various sources to HDFS for further processing.
- Developed workflows using custom MapReduce, Pig, HiveandSqoop .
- Tuned the cluster for optimal performance to process these large data sets.
- Worked hands on with ETL process. Handled importing data from various data sources, performed transformations
- Devised and lead the implementation of the next generation architecture for more efficient data ingestion and processing
- Written Hive UDF to sort Structure fields and return complex data type.
- Responsible for loading data from UNIX file system to HDFS.
- Developed suit of Unit Test Cases for Mapper, Reducer and Driver classes using MR Testing library.
- Designed and developed a distributed processing system running to process binary files in parallel and crunch the analysis metrics into a Data Warehousing platform for reporting.
- Developed workflow in Control M to automate tasks of loading data into HDFS and preprocessing with PIG.
- Cluster co-ordination services through ZooKeeper .
- Used Maven extensively for building jar files of MapReduce programs and deployed to Cluster.
- Modelled Hive partitions extensively for data separation and faster data processing and followed Pig and Hive best practices for tuning.
- HA Implementation of Namenode replication to avoid single point of failure.
- Involved in troubleshooting issues on the Hadoop ecosystem, understanding of systems capacity, bottlenecks, basics of memory, CPU, OS, storage, and networks
- Involved in setup, configuration and management of security for Hadoop clusters using Kerberos and integration with LDAP/AD at an Enterprise level
- Operating system and Hadoop Cluster monitoring using tools like Nagios, Ganglia.
- Responsible for scheduling jobs in Hadoop using FIFO, Fair scheduler and Capacity scheduler
- Possess good Linux and Hadoop System Administration skills, networking and familiarity with open source configuration management and deployment tools such as Salt & Ansible
Environment: Hive QL, MySQL, HBase, HDFS, HIVE, Eclipse (Kepler), Hadoop, Oracle 11g, PL/SQL, SQL*PLUS, Toad 9.6, Flume, PIG, Sqoop, UNIX, Cosmos.