- Around 8 years of overall IT experience in a variety of industries, which includes hands on experience of 4+ years in Big Data Analytics and development.
- Expertise with the tools in Hadoop Ecosystem including Pig, Hive, HDFS, MapReduce, Sqoop, Storm, Spark, Kafka, Yarn, Oozie, and Zookeeper.
- Excellent knowledge on Hadoop ecosystems such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm
- Experience in manipulating/analyzing large datasets and finding patterns and insights within structured and unstructured data.
- Strong experience on Hadoop distributions like Cloudera, MapR and Hortonworks.
- Good understanding of NoSQL databases and hands on work experience in writing applications on NoSQL databases like HBase, Cassandra and MongoDB.
- Worked with various HDFS file formats like Avro, Sequence File and various compression formats like Snappy, bzip2.
- Developed Simple to complex MapReduce streaming jobs using Python language that are implemented using Hive and Pig.
- Skilled in developing applications in Python language for multiple platforms.
- Hands on experience in application development using Java, Linux Shell Scripting.
- Experience in Oozie and workflow scheduler to manage Hadoop jobs by Direct Acyclic Graph (DAG) of actions with control flows.
- Experience in migrating the data using Sqoop from HDFS to Relational Database System and vice - versa according to client's requirement.
- Extensive Experience on importing and exporting data using stream processing platforms like Flume and Kafka.
- Strong Knowledge on Apache Spark with Scala Environment.
- Developed Spark scripts by using Scala shell commands as per the requirement.
- Good hands on experience in creating the RDD's, Data frames for the required input data and performed the data transformations using Spark Scala.
- Good knowledge on real time data streaming solutions using Apache Spark Streaming, Kafka and Flume.
- Experience in designing and developing applications in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
- Very good experience in complete project life cycle (design, development, testing and implementation) of Client Server and Web applications.
- Excellent Java development skills using J2EE, J2SE, Servlets, JSP, EJB, JDBC, SOAP and RESTful web services.
- Strong Experience of Data Warehousing ETL concepts using Informatica Power Center, OLAP, OLTP and AutoSys.
- Experienced in working with Amazon Web Services (AWS) using EC2 for computing and S3 as storage mechanism.
- Strong experience in Object-Oriented Design, Analysis, Development, Testing and Maintenance.
- Excellent implementation knowledge of Enterprise/Web/Client Server using Java, J2EE.
- Experienced in using agile approaches, including Extreme Programming, Test-Driven Development and Agile Scrum.
- Worked in large and small teams for systems requirement, design & development.
- Key participant in all phases of software development life cycle with Analysis, Design, Development, Integration, Implementation, Debugging, and Testing of Software Applications in client server environment, Object Oriented Technology and Web based applications.
- Experience in using various IDEs Eclipse, IntelliJ and repositories SVN and Git.
- Experience of using build tools Ant, Maven.
- Preparation of Standard Code guidelines, analysis and testing documentations.
- Good interpersonal skills, committed, result oriented, hard working with a quest and deal to learn new technologies.
Big Data Technologies: HDFS, MapReduce, YARN, Zookeeper, Hive, Pig, Sqoop, Flume, Spark, Storm, Impala, Oozie, Kafka, NiFi.
NoSQL Databases: HBase, Cassandra, MongoDB, Couchbase.
Distributions: Cloudera, Hortonworks, Amazon Web Services, Azure.
Languages: C, Java, Scala, Python, SQL, PL/SQL, Pig Latin, HiveQL, Java Script, Shell Scripting
Java & J2EE Technologies: Core Java, Servlets, Hibernate, Spring, Struts, JMS, EJB, RESTful
Application Servers: Web Logic, Web Sphere, JBoss, Tomcat.
Databases: Microsoft SQL Server, MySQL, Oracle, DB2
Operating Systems: UNIX, Windows, LINUX
Build Tools: Jenkins, Maven, ANT
Business Intelligence Tools: Tableau, Splunk, Qlik View
Development Tools: Microsoft SQL Studio, Eclipse, NetBeans, IntelliJ
Development Methodologies: Agile/Scrum, Waterfall
Version Control Tools: Git, SVN
Sr. Hadoop/Spark Developer
Confidential, Cary, NC
- Responsible for building scalable distributed data solutions using Hadoop.
- Collected and aggregated large amounts of web log data from different sources such as webservers, mobile and network devices using Apache Flume and stored the data into HDFS for analysis.
- Installed and configured Hadoop MapReduce, HDFS, developed multiple Map Reduce jobs in java for data cleaning and processing.
- Also used Spark SQL to handle structured data in Hive.
- Implemented data ingestion and handling clusters in real time processing using Kafka
- Developed Spark scripts by using Scala shell commands as per the requirement.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Involved in making Hive tables, stacking information, composing hive inquiries, producing segments and basins for enhancement.
- Involved in migrating tables from RDBMS into Hive tables using SQOOP and later generating visualizations using Tableau.
- Analyzed substantial data sets by running Hive queries and Pig scripts.
- Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
- Involved in transforming data from Mainframe tables to HDFS, and HBase tables using Sqoop
- Defined the Accumulo tables and loaded data into tables for near real-time data reports.
- Created the Hive external tables using Accumulo connector.
- Written Hive UDFs to sort Structure fields and return complex data type.
- Used distinctive data formats (Text format and ORC format) while stacking the data into HDFS.
- Worked in AWS environment for development and deployment of custom Hadoop applications.
- Strong experience in working with ELASTIC MAPREDUCE(EMR)and setting up environments on Amazon AWS EC2 instances.
- Ability to spin up different AWS instances including EC2-classic and EC2-VPC using cloud formation templates.
- Collected data using Spark Streaming from AWS S3 bucket in near-real-time and performs necessary Transformations and Aggregations to build the data model and persists the data in HDFS.
- Imported the data from different sources like AWS S3, LFS into Spark RDD.
- Involved in utilizing HCATALOG to get to Hive table metadata from MapReduce or Pig code.
- Involved in creating Shell scripts to simplify the execution of all other scripts (Pig, Hive, Sqoop, Impala and MapReduce) and move the data inside and outside of HDFS.
- Creating files and tuned the SQL queries in Hive utilizing HUE.
- Experience working with Apache SOLR for indexing and querying.
- Created custom SOLR Query segments to optimize ideal search matching.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDD's.
- Implemented ELK (Elastic Search, Log stash, Kibana) stack to collect and analyze the logs produced by the spark cluster.
- Analyzed the SQL scripts and designed the solution to implement using Pyspark
- Experienced in working with spark eco system using Spark SQL and Scala queries on different formats like Text file, CSV file.
- Expertized in implementing Spark using Scala and Spark SQL for faster testing and processing of data responsible to manage data from different sources.
- Worked with NoSQL databases like Hbase in making Hbase tables to load expansive arrangements of semi structured data.
- Acted for bringing in data under HBase using HBase shell also HBase client API.
- Designed the ETL process and created the high level design document including the logical data flows, source data extraction process, the database staging, job scheduling and Error Handling
- Developed and designed ETL Jobs using Talend Integration Suite in Talend 5.2.2
- Created ETL Mapping with Talend Integration Suite to pull data from Source, apply transformations, and load data into target database.
- Good experience with continuous Integration of application using Jenkins.
- Used Reporting tools like Tableau to connect with Hive for generating daily reports of data.
- Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.
Environment: Hadoop, Cloudera, HDFS, MapReduce, YARN, Hive, Pig, Sqoop, Hbase, Apache Spark, Accumulo, Oozie Scheduler, AWS, Tableau, Java, Talend, HUE, HCATALOG, Flume, Solr, Git, Maven.
Confidential, Seattle, WA
- The main aim of the project is tuning the performance of the existing Hive Queries and preparing Spark jobs that are scheduled daily in Tez.
- Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive and MapReduce.
- Implemented test scripts to support test driven development and continuous integration.
- Worked on POC's with Apache Spark using Scala to implement spark in project.
- Consumed the data from Kafka using Apache Spark.
- Ingested streaming data with Apache NiFi into Kafka.
- Implemented Nifi flow topologies to perform cleansing operations before moving data into HDFS.
- Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.
- Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
- Responsible for design development of Spark, SQL Scripts bases on Functional Specifications.
- Responsible for Spark streaming configuration based on type of Input.
- Real time streaming the data using Spark, Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Wrote the Map Reduce jobs to parse the web logs which are stored in HDFS.
- Worked with Nifi for managing the flow of data from source to HDFS.
- Developed the services to run the Map-Reduce jobs as per the requirement basis.
- Importing and exporting data into HDFS and HIVE, PIG using Sqoop.
- Responsible to manage data coming from different sources.
- Monitoring the running MapReduce programs on the cluster.
- Responsible for loading data from UNIX file systems to HDFS. Installed and configured Hive and written Pig/Hive UDFs.
- Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run MapReduce jobs in the backend.
- Writing MapReduce (Hadoop) programs to convert text files into AVRO and loading into Hive(Hadoop) tables.
- Implemented the workflows using Apache Oozie framework to automate tasks.
- Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi structured data coming from various sources.
- Developing design documents considering all possible approaches and identifying best of them.
- Worked with the Apache Nifi flow to perform the conversion of Raw XML data into JSON, AVRO.
- Loading Data into HBase using Bulk Load and Non-bulk load.
- Developed scripts and automated data management from end to end and sync up b/w all the clusters.
- Import the data from different sources like HDFS/HBase into Spark RDD.
- Experienced with Spark Context, Spark -SQL, Data Frame, Pair RDD's, Spark YARN.
- Import the data from different sources like HDFS/HBase into Spark RDD.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala and Python.
- Partitioning data streams using Kafka. Designed and configured Kafka cluster to accommodate heavy throughput of 1 million messages per second. Used Kafka producer 0.8.3 API's to produce messages.
- Involved in gathering the requirements, designing, development and testing.
- Followed agile methodology for the entire project.
- Prepare technical design documents, detailed design documents.
Environment: Hadoop, Spark Core, Spark-SQL, Spark-Streaming, HDFS, MapReduce, Hive, HBase, Flume, Java, Maven, Impala, Pig, Spark, Oozie,Nifi, Oracle, Yarn, GitHub, Junit, Unix, Hortonworks, Flume, Sqoop, HDFS, Java, Scala, Python.
Confidential - Columbus, OH
- Installed, configured, and maintained Apache Hadoop clusters for application development and major components of Hadoop Ecosystem: Hive, Pig, HBase, Sqoop, Flume, Oozie and Zookeeper
- Importing and exporting data into HDFS and Hive from different RDBMS using Sqoop
- Experienced in defining job flows to run multiple MapReduce and Pig jobs using Oozie
- Importing log files using Flume into HDFS and load into Hive tables to query data.
- Developed Hive jobs to transfer 8 years of bulk data from DB2, MS SQL Server to HDFS layer.
- Used HBase-Hive integration, written multiple Hive UDFs for complex queries.
- Involved in all phases of the Big Data Implementation including requirement analysis, design, development, building, testing, and deployment of Hadoop cluster in fully distributed mode Mapping the DB2 V9.7, V10.x Data Types to Hive Data Types and validations.
- Involved in writing APIs to Read HBase tables, cleanse data and write to another HBase table
- Created multiple Hive tables, implemented Partitioning, Dynamic Partitioning and Buckets in Hive for efficient data access
- Written multiple MapReduce programs in Java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file formats.
- Responsible for coding Java Batch, Restful Service, Map Reduce program, Hive query's, testing, debugging, Peer code review, troubleshooting and maintain status report.
- Experienced in running batch processes using Pig Scripts and developed Pig UDFs for data manipulation according to Business Requirements
- Experienced in writing programs using HBase Client API
- Involved in loading data into HBase using HBase Shell, HBase Client API, Pig and Sqoop.
- Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile devices and pushed to HDFS.
- Experienced in design, development, tuning and maintenance of NoSQL database.
- Experience in using HBase as backend database for the application development.
- Developed unit test cases for Hadoop MapReduce jobs with MRUnit.
- Used Bzip2 compression technique to compress the files before loading it to Hive.
- Excellent experience in ETL analysis, designing, developing, testing and implementing ETL processes including performance tuning and query optimizing of database.
Environment: Apache Hadoop, HDFS, MapReduce, Hive, Pig, HBase, Sqoop, Flume, Oozie, Cloudera, Java, Linux, MySQL Server, MS SQL, SQL, PL/SQL, NoSQL
- Prepare Functional Requirement Specification and done coding, bug fixing and support.
- Involved in various phases of Software Development Life Cycle (SDLC) as requirement gathering, data modeling, analysis, architecture design & development for the project.
- Designed the front-end applications, user interactive (UI) web pages using web technologies like HTML, XHTML, and CSS.
- Involved in creation of a queue manager in WebSphere MQ along with the necessary WebSphere MQ objects required for use with WebSphere Data Interchange.
- Developed SOAP based Web Services for Integrating with the Enterprise Information System Tier.
- Use ANT scripts to automate application build and deployment processes.
- Involved in design, development and Modification of PL/SQL stored procedures, functions, packages and triggers to implement business rules into the application.
- Used Struts MVC architecture and SOA to structure the project module logic.
- Developed ETL processes to load data from Flat files, SQL Server and Access into the target Oracle database by applying business logic on transformation mapping for inserting and updating records when loaded.
- Have good Informatica ETL development experience in an offshore and onsite model and involved in ETL Code reviews and testing ETL processes.
- Developed mappings in Informatica to load the data including facts and dimensions from various sources into the Data Warehouse, using different transformations like Source Qualifier, JAVA, Expression, Lookup, Aggregate, Update Strategy and Joiner.
- Developed Micro services using RESTful services to provide all the CRUD capabilities.
- Scheduling the sessions to extract, transform and load data in to warehouse database on Business requirements.
- Struts MVC framework for developing J2EE based web application.
- Extensively used Java multi-threading to implement batch Jobs with JDK 1.5 features.
- Designed an entire messaging interface and Message Topics using WebLogic JMS.
- Implemented the online application using Core Java, JDBC, JSP, Servlets, Spring, Hibernate, Web Services, SOAP, and WSDL.
- Migrated datasource passwords to encrypted passwords using Vault tool in all the JBoss application servers.
- Used Spring Framework for Dependency injection and integrated with the Hibernate framework.
- Developed Session Beans which encapsulates the workflow logic.
- Used JMS (Java Messaging Service) for asynchronous communication between different modules.
- Developed web components using JSP, Servlets and JDBC.
- Involved in client requirement gathering, analysis & application design.
- Involved in the implementation of design using vital phases of the Software development life cycle (SDLC) that includes Development, Testing, Implementation and Maintenance Support in Water fall methodology.
- Involved in Database Connectivity through JDBC.
- Ajax was used to make Asynchronous calls to server side and get JSON or XML data.
- Developed server side presentation layer using Struts MVC Framework.
- Developed Action classes, Action Forms and Struts Configuration file to handle required UI actions and JSPs for Views.
- Developed batch job using EJB scheduling and leveraged container managed transactions for highly transactions.
- Used various Core Java concepts such as Multi-Threading, Exception Handling, Collection APIs, Garbage collections for dynamic memory allocation to implement various features and enhancements.
- Developed Hibernate entities, mappings and customized criterion queries for interacting with database.
- Implemented and developed REST and SOAP based Web Services to provide JSON and Xml data.
- Involved in implementation of web services (top-down and bottom-up).
- Used JPA and JDBC in the persistence layer to persist the data to the DB2 database.
- Created and written SQL queries, tables, triggers, views and PL/SQL procedures to persist and retrieve the data from the database.
- Developed a Web service to communicate with the database using SOAP.
- Performance Tuning and Optimization with Java Performance Analysis Tool.
- Implement JUnit test cases for Struts/Spring components.
- JUnit is used to perform the Unit Test Cases.
- Used Eclipse as IDE and worked on installing and configuring JBOSS.
- Made use of CVS for checkout and check in operations.
- Deployed the components in to WebSphere Application server
- Worked with production support team in debugging and fixing various production issues.