- Having 9+ years of IT experience with clients across different industries and involved in all phases of SDLC in different projects, including around 4 years in big data.
- Deep understanding of Hadoop Architecture of versions 1x, 2x and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce concepts along with Hive, Pig, Sqoop, Flume, Oozie, Zookeeper, Map Reduce framework and NoSql databases like HBase.
- Expertise in writing Hive and Pig scripts and UDFs to perform data analysis on large data sets.
- Worked in data formats such as TextFile, Sequence File, Row Columnar and Optimized Row Columnar, Parquet in HDFS.
- Partitioned and Bucketed data sets in Apache Hive to improve performance.
- Managed and Scheduled jobs on Hadoop cluster using Apache Oozie.
- Extensive experience in developing PIG Latin Scripts and using Hive Query Language for data analytics.
- Experienced in developing BI reports and dashboards using Pentaho Reports and Pentaho Dashboards .
- Willing to work on weekends in rotation basis.
- Experienced with Shell, Perl and Python scripting on Linux, AIX and Windows Platforms.
- Excellent experience in developing Web Services with Python programming language.
- Worked with NoSql database HBase to retrieve data in sparse datasets.
- Experience in creating Spark Contexts , Spark SQL Contexts, Spark Streaming Context to process huge sets of data.
- Experience in performing SQL and hive operations using Spark SQL.
- Performed real time analytics on streaming data using Spark Streaming.
- Created Kafka Topics and distributed to different consumer applications.
- Strong experience in design and development of relational database concepts with multiple RDBMS databases including Oracle 10g, MySQL, MS SQL Server & PL/SQL.
- Developed applications using Java, RDBMS and UNIX Shell scripting, Python.
- Experience in Scala's FP , Case Classes, Traits and leveraged Scala to code Spark application.
- Responsible for building out and improving the reliability and performance of cloud applications and cloud infrastructure deployed on Amazon Web Services.
- Developed shell scripts, python scripts to check the health of Hadoop Daemons and schedule jobs.
- Good knowledge in Azure cloud services, Azure storage,
- Involved in core data pipeline code, involving work in Java, C++ and Python, and built on Apache Kafka, Apache Storm.
- Server migration using cloud servers like AWS from physical to cloud environment by using various AWS features like EC2, S3, Auto scaling, RDS, ELB, EBS, IAM, Route 53 for installing, configuring, deploying and troubleshooting on various Amazon images.
- Involved in developing distributed Enterprise and Web applications using UML, Java/J2EE, Web technologies that include EJB, JSP, Servlets, Struts II, JMS, JDBC, JAX-WS, JPA HTML, XML, XSL, XSLT, Java Script, Spring and Hibernate.
- Experience in Web application development using Java, Servlets, JSP, JSTL, Java Beans, EJB, JNDI, JDBC, DHTML, CSS, PHP and AJAX.
- Expertise in using J2EE Application Servers like Web Logic 8.1/9.2 , I BM Web Sphere 7.x/6.x and Web Servers like Tomcat 5.x/6.x.
- Working knowledge of Software Design Patterns , Big Data Technologies (Hadoop, Horton works Sandbox) and Cloud Technologies & design.
- Experienced in using Agile software methodology (scrum).
- Designed Use Case diagrams, Class diagrams, Activity diagram, Sequence diagrams, Flow Charts, and deployment diagrams using Rational Rose Tool.
- Experience with IDE's like Eclipse , Net Beans , RAD , and JBuilder for developing J2EE/JAVA applications.
- Experience with design Patterns like MVC, Singleton, Factory, Proxy, DAO, Abstract, Prototype and Adaptor.
- Proficient in writing and handling SQL Queries, Stored Procedures, and triggers.
- Hands on experience in knowledge of user acceptance, Black Box, White box and Unit testing.
- Knowledge of multi vendor operating systems including Linux , Windows and UNIX Shell Script.
Big Data Ecosystem: HDFS, HBase, Hadoop Map Reduce, Pig, Hive Sqoop, Spark, Scala, Kafka, Strom, Oozie, Zookeeper, Cassandra
Language: C, C++, Java, Python, Ruby, MySQL, SQL Server, MongoDB
Databases: Oracle 10g, DB2, MySQL, SQL Server, Mongo DB, Talend
Web Technologies: HTML, Java Script, XML, ODBC, JDBC, MVC, Ajax, JSP, Servlets, Struts, IDE/Testing Tools Eclipse, AWS
Operating Systems: UNIX, Windows 9/7/XP/2000/NT/ME/98
Version Controller Tools: SVN, GIT
MS Software Packages: Ms Office, MS Excel, MS Access
ETL Tools: Informatica, Talend and Pentaho
Confidential, Bethlehem, PA
Sr. Hadoop Developer
- Imported data to HDFS from MySQL and exported data from HDFS to MySQL data, using Apache Sqoop
- Modified and Optimized databases to speed up importing to HDFS
- Performed data analysis of online secure data by importing data to HDFS using Apache Flume
- Used SQOOP to import Teradata data to HDFS
- Extracted, modified and loaded data from files, MySQL, Oracle and other input sources to load data into HDFS
- Developed Python Mapper and Reducer scripts and implemented them using Hadoop streaming
- Performed AWS Cloud administration managing EC2 instances, S3, SES and SNS services.
- Migrated Business Critical Applications to AWS Cloud before the deadlines.
- Develop, tested, documented, and implemented Security Policies for AWS Cloud.
- Provided cloud brokering services across multiple Tier 1 Cloud Providers: Microsoft Azure and AWS .
- Experienced working on Pentaho suite (Pentaho Data Integration, Pentaho BI Server, Pentaho Meta Data and Pentaho Analysis Tool).
- Used Pentaho Reports and Pentaho Dashboard in developing Data Warehouse architecture , ETL framework and BI Integration.
- Migrating servers, databases, and applications from on-premise to AWS, Azure and Google Cloud Platform
- Built main applications in Python, Django leveraging technologies such as Tasty pie, Angular.js, Backbone.js and Ember.js.
- Utilized Python in the handling of all hits on Django, Redis , and other applications Loading, analyzing and extracting data to and from Oracle database with Python
- Used standard Python modules e.g. csv, robot parser, itertools, pickle, jinja2, lxml for development.
- Installation of Mark logic on EC2 instance using Ansible .
- Experience in importing data to HIVE using Sqoop/ Talend Studio.
- Have good exposure to creating mapping in Talend between the source Oracle DBs and Target HIVE tables.
- Cleaned data and preprocessed data using MapReduce for efficient data analysis.
- Used Scala and Java to develop MapReduce programs for data cleansing and analysis.
- Developed custom UDFs using Apache Hive to manipulate data sets.
- Created Hive Compact/ Bitmap Indexes to speed up the processing of data.
- Created/Inserted/Updated Tables in Hive using DDL, DML commands.
- Improved performance of datasets for querying through.
- Worked with Hive file formats such as ORC, sequence file, text file partitions and buckets to load data in tables and perform queries.
- Used Pig Custom Loaders to load different from data file types such as XML , JSON and CSV.
- Developed PIG Latin scripts to extract the data from the web server output files and to load into HDFS.
- Scheduled workflow of jobs using Oozie to perform sequential and parallel processing.
- Worked on NoSql database HBase to perform operations on sparse data set.
- Integrated Hive with HBase to upload data and perform row level operations.
- Experienced in creating Spark Context and performing RDD transformations and actions using Python API.
- Used Spark Context to create RDDs to use incoming data to perform Spark Transformations and Actions.
- Created Spark SQL Context to load data from Parquet, JSON files and perform SQL queries
- Created data frames out of text files to execute Spark SQL queries.
- Used Spark's enable Hive Support to execute Hive queries in Spark.
- Linked Kafka and Flume to Spark by adding dependencies for data ingestion.
- Performed data extraction, aggregation, log analysis on real time data using Spark Streaming.
- Used case classes, higher order functions, collections of Scala to apply map transformations on RDDs.
- Used Scala sbt to develop Scala coded spark projects and executed using spark-submit .
- Leveraged option monad with Some and None in Scala to avoid null pointer exceptions.
- Developed Scala Traits to reuse code in other classes
Environment : HDFS, Map Reduce, Hive, Azure, HBase, Pig, Java, AWS, Python, Oozie Scala, Kafka, Spark, Git, Maven, Talend, Pentaho, Putty, Cent OS 6.4, SBT
Confidential, Livonia, MI
- Wrote PIG scripts using various input and output formats. Also designed custom format as per the business requirements.
- Used SQOOP to dump data from MySQL relational database into HDFS for processing and exporting data to RDMS.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing, analyzing and testing the classifier using MapReduce, Pig and Hive jobs.
- Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (like, Pig, Hive, and Sqoop ) as well as system specific jobs (such as Perl and shell script).
- Automated all the jobs, for pulling data from relational databases to load data into Hive tables, using Oozie workflows and enabled email alerts on any failure cases.
- Involved in migrating the map reduce jobs into Spark Jobs and Used Spark SQL and Data frames API to load structured and semi structured data into Spark Clusters
- Worked on SPARK engine creating batch jobs with incremental load through STORM, KAFKA , SPLUNK, FLUME.
- Worked with Kafka for the proof of concept for carrying out log processing on a distributed system
- Used Spark SQL for Scala & Python interface that automatically converts RDD case classes to schema RDD
- Tool monitored log input from several datacenters, via Spark Stream , was analyzed in Apache Storm and data was parsed and saved into Database.
- Used tools like SQOOP, Kafka to ingest data into Hadoop
- Implemented Database access through JDBC at Server end with Oracle .
- Used Spring Aspect Oriented Programming (AOP) for addressing cross cutting concerns.
- Developed request/response paradigm by using Spring Controllers, Inversion of Control and Dependency Injection with Spring MVC .
- Used CVS for version control and Log4j for logging.
- Used Pig and Hive in the analysis of data.
- Extracted files from NoSQL database like Cassandra using Sqoop .
- Worked with Flume to import the log data from the reaper logs and syslog's into the Hadoop cluster.
- Used complex data types like bags, tuples, and maps in Pig for handling data.
- Created/modified UDF and UDAFs for Hive whenever necessary.
- Involved in managing running and pending tasks Map Reduce through Cloudera manager console.
- Developed Pig UDFs for preprocessing thee data for analysis.
- Involved in writing shell scripts for scheduling and automation of tasks.
- Managed and reviewed Hadoop log files to identify issues when job fails.
- Hands on experience with NoSQL databases like HBase , Cassandra for POC (proof of concept) in storing
- URL's, images, products and supplements information at real time.
- Worked on Hive for analysis and generating transforming files from different analytical formats to text files.
- Used Hue for UI based PIG script execution, Oozie scheduling
- Involved in writing Hive queries for data analysis with respect to business requirements.
- Also assisted admin team in installation and configuration of additional nodes in Hadoop cluster
Environment: Apache Hadoop (Gen 1), Hive, Pig, Sqoop, Oozie, HBase, Map-Reduce(MR1), Cloudera, HDFS, Flume, Hue, Linux, HTML5 & CSS3, Hadoop2.2, jQuery, Maven, MongoDB, Java, JDK1.6, J2EE, JDBC, Spring 2.0, Hibernate 4.2.
- Collected log data and staging data using Apache Flume and stored in HDFS for analysis.
- Implemented helper classes that access HBase directly from java using Java API to perform CRUD operations.
- Handled different time series data using HBase to perform store data and perform analytics based on time to improve queries retrieval time.
- Developed MapReduce programs to parse the raw data and store the refined data in tables.
- Performed debugging and fine tuning in Hive & Pig for improving performance.
- Used Oozie operational services for batch processing and scheduling workflows dynamically.
- Analyzed the web log data using the HiveQL to extract number of unique visitors per day.
- Exported the analysed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Performed Map side joins on data in Hive to explore business insights.
- Involved in forecast based on the present results and insights derived from data analysis.
- Integrated Map Reduce with Hbase to import bulk amount of data into HBase using Map Reduce Programs .
- Built application logic using Python and worked on event-driven programming in Python .
- Pull information from Jira using REST API and Python to populate excel files for management reports
- Developed several REST web services supporting both XML and JSON. REST web services leveraged by both web and mobile applications.
- Participated in team discussions to develop useful insights from big data processing results.
- Suggested trends to the higher management based on social media data.
Environment : HDFS, MapReduce, Hive, HBase, Pig, Java, Git, Maven, Talend, Putty, Python, REST, CentOS 6.3
- Developed various UML diagrams like use cases, class diagrams, interaction diagrams (sequence and collaboration) and activity diagrams.
- Responsible for designing and implementing the web tier of the application from inception to completion using J2EE technologies such as MVC framework, Servlets , Java Beans , JSP .
- Developed the application using Struts Framework that leverages classical Model View Layer (MVC Model2) architecture.
- Implemented Business processes such as user authentication, Account Transfer using Session EJB.
- Implemented Hibernate for O/R mapping and persistence.
- Worked on Creative Suite 3 and Creative Suite 4 for creating websites and presentations.
- Involved in the components styling ( CSS ) and skinning.
- Involved in multi-tiered J2EE design utilizing Spring IOC and Hibernate deployed on WebSphere Application Server connecting to DB2 database.
- Used Java Messaging Services (JMS) for reliable and asynchronous exchange of important information such as payment status report.
- Developed JUnit test cases for all the developed modules.
- Extensively used DB2 Database to support the SQL.
- Used CVS for version control across common source code used by developers.
- Used Log4J to capture the log that includes runtime exceptions.
- Used JDBC to invoke Stored Procedures and database connectivity.
- Responsible for data reconciliation with EOD files using scheduled batch process.
- Responsible for system development using J2EE architecture.
- Used Spring Framework for dependency injection, transaction management and AOP .
- Involved in Springs MVC model integration for front-end request action controller.
- Developed by utilizing Spring, Hibernate, Struts, Oracle, JPA, JQuery, Java Script, Spring core.
- Used Spring ORM support, Hibernate for development of DAO layer.
- Involved in implementing the DAO pattern for database connectivity and Hibernate.
- Written SQL queries and did modifications to existing database structure as required for addition of new features.
- Involved in designing the database and developed Stored Procedures , triggers using PL/SQL .
- Involved in the JMS Connection Pool and the implementation of publish and subscribe using Spring JMS .
- Used JMS Template to publish and Message Driven POJO (MDP) to subscribe from the JMS provider
- Generated object relational mapping ( ORMs ) using XML for Java classes and databases.
- Used Eclipse platform to design and code in J2EE stack.
- Designed and developed an enterprise common logging around Log4j with a centralized log support (used logger info, error and debug).
- Implemented Java and J2EE Design patterns like Business Delegate and Data Transfer Object ( DTO ), Data Access Object (DAO) and Service Locator.
- Application of JQuery / JS for responsive GUI.
- Setting up distributed environment and deploying application on distributed system.
- Developed stored procedures and triggers using PL/SQL in order to calculate and update the tables to implement business logic.
- Used Spring Framework AOP Module to implement Logging in the application to know the application status XML Parsing/Domain.
- Used JDBC to connect the web applications to Databases.
- Used parsers like SAX and DOM for parsing xml documents and used XML transformations using XSLT.
- Designed REST APIs that allow sophisticated, effective and low cost application integration.
- Designed and documented REST / HTTP APIs, including JSON data formats and API versioning strategy.
- Gained Knowledge in building sophisticated distributed systems using REST/hypermedia web APIs ( SOA ) and developed POCs .
Environment : Java/J2EE, Oracle 10g, SQL, PL/SQL, JSP, EJB, Struts, Hibernate, Web Logic 8.0, HTML, AJAX, Java Script, JDBC, XML, JMS, XSLT, UML, JUnit, Log4j, Eclipse 6.0.