- Around 7+ years of IT experience in software Development Life Cycle (Analysis, Design, Development, Testing, Deployment and Support) using WATERFALL and AGILE methodologies. Having 3+ years of experience in Data Analysis using Hadoop Eco System components (Spark, HDFS, MapReduce, Pig, Sqoop,Kafka, Hive, Cassandra and HBase) in Financial, Retail and Health - care sector.
- Experience in Hadoop components like HDFS, MapReduce, Job Tracker, Name Node, Data Node Task Tracker and Apache Spark.
- Experience in importing data from existing relational databases (Oracle, MySQL and Teradata) that provide SQL interfaces using Sqoop.
- Hands on experience in Avro, Parquet, RC files and Combiners, Counters, Dynamic Partitions, Bucketing for best practice and performance improvement.
- Experience in developing Map Reduce programs using java API and using hive, pig to perform data analysis, data cleaning and data transformation.
- Designed HIVE queries & Pig scripts to perform data analysis, data transfer and table design to load data into Hadoop environment.
- Expertise in writing Hive UDF, Generic UDF's to in corporate complex business logic into Hive Queries.
- Implemented Sqoop scripts for large dataset transfer between Hadoop and RDBMS.
- Expertise in working with Hive data warehouse tool-creating tables, data distribution by implementing partitioning, bucketing, writing and optimizing the HiveQL queries.
- Worked on different file formats (ORCFILE, TEXTFILE) and different Compression Codecs (GZIP, SNAPPY, BZIP)
- Experience in composing shell scripts to dump the shared information from MySQL servers to HDFS.
- Performed different ETL operations using Pig for joining operations and transformations on data to join, clean, aggregate and analyze data.
- Involved In working with Maven for build process.
- Extensive experience on importing and exporting data using stream processing platforms like Flume and Kafka
- Experience in data workflow scheduler Zoo-Keeper and Oozie to manage Hadoop jobs by Direct Acyclic Graph (DAG) of actions with the control flows.
- Knowledge in creating impala views on top of Hive tables for faster access to analyze data.
- Integrated BI tool like Tableau with Impala and analyzed the data.
- Experience with NoSQL databases like HBase, Cassandra and MongoDB.
- Hands on Experience in designing and developing applications in Spark using Scala to compare the performance of Spark with Hive.
- Experience in collecting the log data from different sources (webservers and social media) using Flume, Kafka and storing in HDFS to perform the MapReduce jobs.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data.
- Exposure in working with data frames in Spark.
- Hands on experience in working with Spark SQL queries, Data frames, import data from Data sources, perform transformations, perform read/write operations, save the results to output directory into HDFS.
- Profound experience in working with Cloudera (CDH4 &CDH5) and Horton Works Hadoop Distributions and Amazon EMR Hadoop distributors on multi-node cluster.
- Good knowledge on AWS infrastructure services Amazon Simple Storage Service (Amazon S3), EMR, and Amazon Elastic Compute Cloud (Amazon EC2).
- Knowledge in creating dashboards with the help of business inteligence tool such as Tableau.
- Very Good understanding and Working Knowledge of Object Oriented Programming (OOPS), J2SE, Multithreading in Core Java, HTML, servlets, JSP, JDBC.
- Experience in working with different relational databases like MySQL and Oracle.
- Working knowledge in database design, writing complex SQL Queries and Stored Procedures.
- Capable at using AWS utilites such as EMR,S3 and Cloud watch to run and monitor Hadoop/Spark jobs on AWS.
- Having knowledge in making use of Pycharm and Python shell to develop spark based applications using Python as lanquage.
- Expertise in various faces of Software Development including analysis, design, development and deployment of applications using Servlets, JSP, Java Beans, Struts, Spring Framework, JDBC.
- Having Experience on Development applications like Eclipse, NetBeans etc.
- Involved in Agile methodologies, daily scrum meetings, spring planning.
- Good analytical, communication, problem solving skills and adore learning new technical, functional skills .
Bigdata Ecosystem: HDFS and Map Reduce, Pig, Hive, Impala, YARN, HUE, Oozie, Zookeeper, Solr, Apache Spark, Apache STORM, Apache Kafka, Sqoop, Flume.
NoSQL Databases: HBase, Cassandra, and MongoDB
Hadoop Distributions: Cloudera, Hortonworks
Programming languages: Java, C, SCALA, Pig Latin, HiveQL.
Scripting Languages: Shell Scripting
Databases: MySQL, oracle, Teradata, DB2
Build Tools: Maven, Ant, sbt
Reporting Tool: Tableau
Version control Tools: SVN, Git, GitHub
Cloud: AWS, Azure
App/Web servers: WebSphere, WebLogic, JBoss and Tomcat
Operating Systems: WINDOWS 10/8/Vista/ XP
Development IDEs: NetBeans, Eclipse IDE, Python(IDLE)
Packages: Microsoft Office, putty, MS Visual Studio
Confidential, New York City,NY
- Developed data pipeline using Kafka, Sqoop, Hive and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis.
- Developed SQOOP scripts for importing and exporting data into HDFS and Hive.
- Developing design documents considering all possible approaches and identifying best of them.
- Responsible to manage data coming from different sources.
- Developing business logic using Scala.
- Responsible for loading data from UNIX file systems to HDFS
- Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run MapReduce jobs in the backend.
- Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi structured data coming from various sources.
- Developed scripts and automated data management from end to end and sync up between all the clusters.
- Exploring with Spark for improving the performance and optimization of the existing algorithms in Hadoop.
- Import the data from different sources like HDFS/HBase into Spark RDD.
- Experienced with Spark Context, Spark -SQL, Data Frame, Pair RDD's, Spark YARN.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDD and Scala.
- Developed functional programs in SCALA for connecting the streaming data application and gathering web data.
- Implemented the workflows using Apache Oozie framework to automate tasks.
- Configured connection between Hive and Tableau using Impala for BI development tool.
- Worked in Agile Methodology and used JIRA for maintain the stories about project.
- Experience in automated scripts using Unixshell scripting to perform database activities.
- Working experience with Linux lineup like Redhat and CentOS.
- Good analytical,communication,problem solving skills and adore learning new technical, functional skills.
Environment: Hadoop, Map Reduce, Hive, Java, Maven, Impala, Pig, Spark, Oozie, Oracle, Yarn, GitHub, Tableau, Unix, Cloudera, Kafka, Sqoop, Scala, HBase.
Confidential, Rocky Hills,CT
- Created hive queries for extracting data and sending them to clients.
- Created SCALA programs to develop the reports for Business users.
- Created hive UDFs for formatting data in SCALA.
- Distributed programming through spark, specifically Scala.
- Transformation and Analysis in Hive/Pig, Parsing the raw data using Map reduce and SPARK.
- Proactively involved in ongoing maintenance, support and improvements in Hadoop cluster.
- Worked on capturing transactional changes in the data using MAPREDUCE and HBASE.
- Understanding the existing Enterprise data warehouse set up and provided design and architecture suggestion converting to Hadoop using MapReduce, HIVE, SPARK, SQOOP and Pig Latin.
- Familiar with AWS Components like EC2,S3.
- Worked with SQOOP import and export functionalities to handle large data set transfer between DB2 database and HDFS.
- Worked on ingesting data from different sources.
- Supported multiple application extracts coming out of Big Data Platform.
- Followed agile methodology during project delivery.
- Knowledge of CodeHub and GIT.
- Worked/Coordinated with Offshore to complete the tasks.
- Understanding of ServiceNowtool to submit Change requests, incidents for application deployments.
Environment: mapR, Hive, Pig, SPARK, SCALA, MapReduce, UNIX scripting, HBASE, Talend.
Confidential, Cranston, RI
- Implemented technical architecture and developed various Big Data workflows using custom MapReduce, Pig, Hive, Cassandra and Sqoop.
- Deployed on premise cluster and tuned the cluster for optimal performance for job execution needs and processes large data sets.
- Built re-usable Hive UDF libraries for business requirements which enabled various business analysts to use these UDF’s in Hive querying.
- Used Kafka to dump the application server logs into HDFS.
- The logs that are stored on HDFS are analyzed and the cleaned data is imported into Hive warehouse which enabled end business analysts to write Hive queries.
- Configured various big data workflows to run on the top of Hadoop using oozie and these workflows comprise of heterogeneous jobs like Pig, Hive, Sqoop and MapReduce.
- Experience in working with NoSQL database HBase in getting real time data analytics.
- Used Maven extensively for building jar files of MapReduce programs and deployed to Cluster.
- Assigned the tasks of resolving defects found in testing the new application and existing applications.
- Analyzing the requirements, designing and developing solutions.
- Managing Project team in achieving the project goals including resource allocation, resolving technical issues and mentoring the resources.
- Used Linux (Ubuntu) machine for designing, developing and deploying of Java modules.
Environment: MapReduce, Pig, Hive, Sqoop, Kafka, FLUME, HBase, JDK 1.6, Maven, Linux
Confidential - Rolling Meadows, IL
- Designed docs and specs for the near real-time data analytics using Hadoop and HBase.
- Installed Cloudera Manager on the clusters.
- Used a 15-node cluster with Cloudera Hadoop distribution on Amazon EC2.
- Developed ad-clicks based data analytics, for keyword analysis and insights.
- Crawled public posts from Facebook and tweets.
- Used Flume and Kafka to get the streaming data from Twitter and Facebook.
- Hands on experience in MapReduce jobs with the Data Science team to analyze this data.
- Converted output to structured data and imported to Tableau with analytics team.
- Defined problems to look for right data and analyze results to make room for new project.
Environment: Hadoop, HBase, HDFS, MapReduce, Flume, Java, Tableau, Cloudera Manager, Amazon EC2.
- Interaction with business team for detailed specifications on the requirements and issue resolution.
- Developed user interfaces using HTML, XML, CSS, JSP, Java Script and Struts Tag Libraries and defined common page layouts using custom tags.
- Implemented Struts MVC Paradigm components such as Action Mapping, Action class, Action Form, Validation Framework, Struts Tiles and Struts Tag Libraries.
- Involved in the development of the front end of the application using Struts framework and interaction with controller java classes.
- Provided development support for System Testing, User Acceptance Testing and Production and deployed application on JBoss Application Server.
- Wrote and executed efficient SQL queries (CRUD operations), JOINs on multiple tables, to create and test sample test data in Oracle Database using Oracle SQL Developer.
- Used CVS for check-in, check-out of files to control versions of files.
- Used Eclipse as an IDE.
- Used HP Quality Center to track activities and defects
- Implemented logging with Log4j
- Used Maven to compile and build project.
- Developed Style Sheet to provide dynamism to the pages and extensively involved in unit testing and System testing using JUnit and involved in critical bug fixing.
- Utilized the base UML methodologies and Use cases modeled by architects to develop the front-end interface. The class, sequence and state diagrams were developed using Visio.
- Involved in installation and configuration of SQL server 2005 on Database Servers.
- Developed database objects like Tables, Views, User-defined Functions and Triggers to handle complex business rules, history data and audit analysis.
- Worked with Complex T-SQL queries, Sub queries, co-related sub queries and joins to fetch the data as per the functional requirements.
- Used Common Table expressions for hierarchical data and complex stored procedures.
- Created various integrity constraints like Primary Key, Foreign Keys, Unique and Check to support application functionality.
- Worked with command shell to invoke executables in SQL Stored Procedures .
- Actively participated in gathering of User Requirement and System Specification.
- Maintained User account administration for Different domains.
- Involved in creating SQL reports and generating emails through DB Mail.
- Worked with loading of data from Excel using OPEN ROWSET commands.
- Creation/ Maintenance of Indexes for fast and efficient reporting process .
- Created SSIS package to load data from Flat files, DB2 by using Lookup, Derived Columns, Data conversions and Condition Split transformations.
- Maintained the physical Databases by monitoring Performance, space utilization and physical integrity.
- Generating Reports as per the requirement using SSRS.
Environment: MS SQL server 2005, Microsoft Visual studio 2005, SSIS, SSRS, DB2, Microsoft Visual Studio 2005, Windows Server 2005, Performance Monitor and MS Office.