- Around 6 years of IT experience in software Development Life Cycle (Analysis, Design, Development, Testing, Deployment and Support) using WATERFALL and AGILE methodologies.
- Having around 4 years of experience in Data Analysis using Hadoop Eco System components (Spark, HDFS, MapReduce, Pig, Sqoop, Hive, Cassandra and HBase) in Financial, Retail and Health - care sector.
- Experience in Hadoop components like HDFS, MapReduce, Job Tracker, Name Node, Data Node Task Tracker and Apache Spark.
- Hands on experience in Capturing data from existing relational databases (Oracle, MySQL, SQL and Teradata) that provide SQL interfaces using Sqoop.
- Hands on experience in Sequence files, RC files, Avro, Parquet, and Combiners, Counters, Dynamic Partitions, Bucketing for best practice and performance improvement.
- Skilled in developing Java Map Reduce programs using java API and using hive, pig to perform data analysis, data cleaning and data transformation.
- Developed multiple MapReduce jobs to perform data cleaning and preprocessing.
- Designed HIVE queries & Pig scripts to perform data analysis, data transfer and table design to load data into Hadoop environment.
- Experience in working with Cloudera and Hortonworks Hadoop distribution.
- Expertise in writing Hive UDF, Generic UDF's to in corporate complex business logic into Hive Queries.
- Good knowledge on AWS infrastructure services Amazon Simple Storage Service (Amazon S3), EMR, and Amazon Elastic Compute Cloud (Amazon EC2)
- Implemented Sqoop for large dataset transfer between Hadoop and RDBMS.
- Expertise in working with Hive data warehouse tool-creating tables, data distribution by implementing partitioning, bucketing, writing and optimizing the HiveQL queries.
- Worked on different file formats (ORCFILE, Parquet, Avaro,TEXTFILE) and different Compression Codecs (GZIP, SNAPPY, LZO,BZIP) .
- Experience in composing shell scripts to dump the shared information from MySQL servers to HDFS.
- Proficiency in using Apache Sqoop to import and export data from other databases to HDFS and vice versa.
- Performed different ETL operations using Pig for joining operations and transformations on data to join, clean, aggregate and analyze data.
- Involved In working with Maven, Ant, sbt and Gradle for build process.
- Extensive experience on importing and exporting data using stream processing platforms like Flume and Kafka
- Experience in data workflow scheduler Zoo-Keeper and Oozie to manage Hadoop jobs by Direct Acyclic Graph (DAG) of actions with the control flows.
- Knowledge in creating impala views on top of Hive tables for faster access to analyze data.
- Integrated BI tool like Tableau with Impala and analyzed the data.
- Experienced in performance tuning and real-time analytics in both relational database and NoSQL database (HBase).
- Worked on MongoDB by using CRUD (Create, Read, Update and Delete), Indexing, Replication and Sharing features.
- Experience with NoSQL databases like HBase, MongoDB and Cassandra.
- Hands on Experience in designing and developing applications in Spark using Scala to compare the performance of Spark with Hive.
- Experienced in integrating Kafka with Spark streaming for high speed data processing.
- Experience in collecting the log data from different sources (webservers and social media) using Flume, Kafka and storing in HDFS to perform the MapReduce jobs.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data.
- Exposure in working with data frames.
- Hands on experience in working with Spark SQL queries, Data frames, import data from Data sources, perform transformations, perform read/write operations, save the results to output directory into HDFS.
- Profound experience in working with Cloudera (CDH4 &CDH5) and Horton Works Hadoop Distributions and Amazon EMR Hadoop distributors on multi-node cluster.
- Exposure towards simplifying and automating big data integration with graphical tools and wizards that generate native code using Talend(ETL).
- Knowledge in importing results into visualization tool Tableau to create dashboards.
- Very Good understanding and Working Knowledge of Object Oriented Programming (OOPS), J2SE, Multithreading in Core Java, HTML, servlets, JSP, JDBC.
- Experience in working with different relational databases like MySQL, MS SQL and Oracle.
- Strong experience in database design, writing complex SQL Queries and Stored Procedures
- Expertise in various faces of Software Development including analysis, design, development and deployment of applications using Servlets, JSP, Java Beans, Struts, Spring Framework, JDBC.
- Having Experience on Development applications like Eclipse, NetBeans etc.
- Involved in Agile methodologies, daily scrum meetings, spring planning.
- Good analytical, communication, problem solving skills and adore learning new technical, functional skills.
Bigdata Ecosystem: HDFS and Map Reduce, Pig, Hive, Impala, YARN, HUE, Oozie, Zookeeper, Solr, Apache Spark, Apache STORM, Apache Kafka, Sqoop, Flume.
NoSQL Databases: HBase, Cassandra, and MongoDB
Hadoop Distributions: Cloudera, Hortonworks
Programming languages: Java, C/C++, SCALA, Pig Latin, HiveQL.
Scripting Languages: Shell Scripting, Java Scripting
Databases: MySQL, oracle, Teradata, DB2
Build Tools: Maven, Ant, Gradle, sbt
Reporting Tool: Tableau
Version control Tools: SVN, Git, GitHub
Cloud: AWS, Azure
App/Web servers: WebSphere, WebLogic, JBoss and Tomcat
Operating Systems: WINDOWS 10/8/Vista/ XP
Development IDEs: NetBeans, Eclipse IDE, Python(IDLE)
- Developed data pipeline using Kafka, Sqoop, Hive to ingest Customer transactional data and behavioral data into HDFS for processing and analysis.
- Involved in writing SQOOP scripts for importing and exporting data into HDFS and Hive.
- Developed scripts and automated data management from end to end and sync up between all the clusters.
- Exploring with Spark for improving the performance and optimization of the existing algorithms in Hadoop.
- Responsible for importing and exporting streaming data into HDFS using stream processing platforms like Flume and Kafka messaging system.
- Strong experience and knowledge of real time data analytics using Spark Streaming, Kafka and Flume.
- Extensively worked on Spark streaming and Apache Kafka to fetch live stream data.
- Import the data from different sources like HDFS/HBase into Spark RDD.
- Worked on Spark Context, Spark -SQL, Data Frame, Pair RDD's, Spark YARN.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDD and Scala.
- Implemented the workflows using Apache Oozie framework to automate tasks.
- Imported results into visualization BI tool Tableau to create dashboards.
- Worked in Agile Methodology and used JIRA for maintain the stories about project.
- Involved in gathering the requirements, designing, development and testing.
- Developed the services to run the Map-Reduce jobs as per the requirement basis.
- Responsible to manage data coming from different sources.
- Developing business logic using Scala.
- Responsible for loading data from UNIX file systems to HDFS. Installed and configured Hive and written Pig/Hive UDFs.
- Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run MapReduce jobs in the backend.
- Writing MapReduce programs to convert text files into AVRO and loading into Hive tables.
- Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi structured data coming from various sources.
Environment: Apache Hadoop, HDFS, MapReduce, Sqoop, Flume, Pig, Hive, HBase, Oozie, Scala, Spark, Spark Streaming, Kafka.
- Extracted the data from RDBMS into HDFS using Sqoop.
- Loaded and transformed large sets of structured and semi-structured data using Spark SQL and Data Frames API into Spark clusters.
- Developed Spark applications Using Scala as per the Business requirements.
- Used Spark Data Frame Operations to perform required validations on the data.
- Implemented Hive Partitioning and bucketing for data analytics.
- Responsible in performing sort, join, aggregations, filter, and other transformations on the datasets.
- Created Hive tables and working on them for data analysis to cope up with the requirements.
- Involved in creating views for the data security.
- Involved in the performance tuning of spark applications.
- Worked on Performance and Tuning operations in Hive.
- Involved in creating workflows to run Sqoop jobs .
- Involved in Agile methodologies, daily Scrum meetings, Sprint planning.
- Experienced in using version control tools like GitHub to share the code snippet among the team members
- Used Apache NIFI to copy the data from local file system to HDFS.
- Developed UDF functions for Hive and wrote complex queries in Hive for data analysis.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Used ETL processes to load data from flat files into the target database by applying business logic on transformation mapping for inserting and updating records when loaded.
- Worked on data serialization formats for converting complex objects into sequence bits by using Avro, ORC file formats.
- Designed and developed Hive tables to store staging and historical data.
- Implemented Hortonworks NiFi (HDP 2.4) and recommended solution to inject data from multiple data sources to HDFS and Hive using NiFi.
- Experience in using ORC file format with Snappy compression for optimized storage of Hive tables.
- Developed Oozie workflow for scheduling and orchestrating the ETL process.
- Involved in migrating MapReduce jobs into Spark jobs.
- Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
- Integrated BI tool like Tableau with Impala and analyzed the data.
Environment: Apache Hadoop, HDFS, MapReduce, Sqoop, Pig, Hive, Oozie, Scala, Spark, Spark SQL, Nifi, Hortonworks.
- Developed Hive queries for extracting data and sending them to clients.
- Created SCALA programs to develop the reports for Business users.
- Created hive UDFs for formatting data in SCALA.
- Distributed programming through spark, specifically Scala.
- Transformation and Analysis in Hive/Pig, Parsing the raw data using Map reduce and SPARK.
- Worked on capturing transactional changes in the data using MAPREDUCE and HBASE.
- Understanding the existing Enterprise data warehouse set up and provided design and architecture suggestion converting to Hadoop using MapReduce, HIVE, SPARK, SQOOP and Pig Latin.
- Familiar with AWS Components like EC2,S3.
- Worked with SQOOP import and export functionalities to handle large data set transfer between DB2 database and HDFS.
- Worked on ingesting data from different sources.
- Supported multiple application extracts coming out of Big Data Platform.
- Followed agile methodology during project delivery.
- Knowledge of CodeHub and GIT.
- Worked/Coordinated with Offshore to complete the tasks.
- Understanding of ServiceNowtool to submit Change requests, incidents for application deployments.
Environment: mapR, Hive, Pig, SPARK, SCALA, MapReduce, UNIX scripting, Talend.
- Interaction with business team for detailed specifications on the requirements and issue resolution.
- Developed user interfaces using HTML, XML, CSS, JSP, Java Script and Struts Tag Libraries and defined common page layouts using custom tags.
- Implemented Struts MVC Paradigm components such as Action Mapping, Action class, Action Form, Validation Framework, Struts Tiles and Struts Tag Libraries.
- Involved in the development of the front end of the application using Struts framework and interaction with controller java classes.
- Domain model creation and enhancement using XSD and Hibernate.
- Provided development support for System Testing, User Acceptance Testing and Production and deployed application on JBoss Application Server.
- Wrote and executed efficient SQL queries (CRUD operations), JOINs on multiple tables, to create and test sample test data in Oracle Database using Oracle SQL Developer.
- Used CVS for check-in, check-out of files to control versions of files.
- Used Eclipse as an IDE.
- Used HP Quality Center to track activities and defects .
- Implemented logging with Log4j .
- Used Ant to compile and build project.
- Developed Style Sheet to provide dynamism to the pages and extensively involved in unit testing and System testing using JUnit and involved in critical bug fixing.
- Utilized the base UML methodologies and Use cases modeled by architects to develop the front-end interface. The class, sequence and state diagrams were developed using Visio.