- Around 8 years of IT experience in the Analysis, design, development, testing, Implementation as Big data and Java developer.
- Around 4 years of experience in Big data technologies and Hadoop ecosystem components like AWS S3, EMR, EC2, HDFS, MapReduce, Python, Spark(pyspark), Hive, Sqoop, Yarn, pig, Impala, Hue, airflow, flume, Kafka and NoSQL systems like HBase, Cassandra.
- Experience working with Hadoop clusters using AWS, Cloudera and Hortonwork distributions.
- Proficient in ETL process consisting of data transformation, data sourcing, mapping, conversion and loading using spark.
- Experience in performance tuning of spark applications by using coalesce, repartitioning, broadcast variables and tuning spark executor memory.
- Extensively worked on dynamic partitioning and bucketing of both Managed and External tables in Hive.
- Developed UDF’s and UDAF’s in Hive and Pig to extend functionality.
- Worked with a variety of data like text files, JSON data, etc.
- Built hive tables in different formats like Avro, RCFile, ORC, Parquet etc.
- Experience in tuning and troubleshooting Hive queries.
- Extensive experience in working with semi/unstructured data by implementing complex MapReduce programs using combiners, partitioners and design patterns.
- Used Sqoop to import/export data from relational databases to HDFS/S3 and vice versa.
- Strong knowledge on architecture of distributed systems and parallel processing frameworks.
- Proficient in UNIX Shell scripting for automating deployments and other tasks.
- Scheduled jobs in Airflow.
- Experience in Apache Flume and Kafka for collection, aggregation and moving huge chunks of data from various sources such as web servers, telnet sources etc.
- Involved in understanding Business and data needs and analyze multiple data sources and document data mapping to meet those needs.
- Conversant with web application servers like Tomcat, WebSphere, Weblogic and Jboss servers.
- Experience in data cleansing and performing data analysis using algorithms.
- Monitored databases, took back up and restored databases.
- Worked on Agile methodology and participated in scrums.
- Extensive experience in Requirements gathering, Analysis, Design, Reviews, Coding and Code Reviews, Unit and Integration Testing.
- Good knowledge in Web Services, SOAP programming, WSDL, and XML parsers like SAX, DOM, AngularJS, Responsive design/Bootstrap.
- Flexible, enthusiastic and project oriented team player with excellent communication skills with leadership abilities to develop creative solutions for challenging requirement of client.
- Experience in building, deploying and integrating applications with ANT, Maven.
Hadoop Eco System: Spark, Hadoop, Map Reduce, HDFS, Hive, Pig, Sqoop, airflow, AWS S3, EMR, EC2, Impala, Kafka, Yarn, Hue, Hbase, Cassandra, Zookeeper, oozie, flume, Cloudera Manager.
Operating Systems: Windows 8/7/2000, UNIX, MS - DOS, LINUX
Database: Oracle IBM DB2, MS SQL Server 2012/2008/, MySQL, Teradata
NoSQL Databases: MongoDB, Cassandra, HBase
Programming Skills: PL/SQL, SQL, Shell Scripts, Python, C, Java, Pig Latin, Hive QL, J2EE, JSP, Servelets
Data Modeling Tools: ERwin 8/7
Methodologies: Rational Unified Process (RUP), Waterfall, Spiral, Agile
Version Control System: Github, CVS, SVN
Confidential, Chicago, IL
- Data ingestion from SQL Server, Oracle into AWS S3 using Sqoop.
- Workarounds to avoid s3 inconsistency issues.
- ETL process implemented in Spark (pyspark) using data frames on EMR.
- Worked with JSON data and loading the data in to Hive tables.
- Implemented Incremental merging in Hive.
- Used dynamic partitioning and bucketing in Hive.
- Generated python scripts and automated processes.
- Created Spark jobs efficiently with data cache, coalesce, repartition methods to improve performance.
- Creating HDFS staging tables to improve throughput to write into S3 parquet files.
- Performance tuning of Hive and spark jobs.
- Created Partitions, Buckets based on State to further process using Bucket based Hive joins
- Worked on multiple projects simultaneously
- Developed Spark streaming applications for consuming the data from Kafka topics.
- Developed end to end data processing pipelines that begin with receiving data using distributed messaging systems Kafka through persistence of data into HBase.
- Worked on various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in Hive and Map Side joins.
- Used airflow for scheduling jobs.
- Created custom input adapters for pulling the raw click stream data from FTP servers and AWS S3 buckets.
Environment:: AWS, Cloudera, Hive, HDFS, Spark, NiFi, Storm, Kafka, Sqoop, HBase, Oozie, Java, Maven, Eclipse, Cassandra.
Confidential, Denver, CO
Big data Developer
- Worked with HDFS, Hive, Sqoop, Spark, Oozie.
- Loaded data from Relational databases to HDFS using Sqoop.
- Developed and integrated components to run and validate batch jobs on AWS EMR.
- Worked extensively with Hive.
- Created UDF’s and UDAF’s in Hive.
- Created Hive tables to load large sets of structured, semi-structured and unstructured data coming variety of portfolios.
- Worked on a project focusing Hadoop/Hive on AWS using both EMR and non EMR Hadoop.
- Worked on Cloudera Distribution
- Converted Pig scripts to Spark scripts and migrated the process which are running on CDH to EMR.
- Worked on syncing Oracle RDBMS to Hadoop while retaining oracle as the main data store.
- Created External and Internal tables and partitioned appropriately based on the correct column to improve the performance.
- Created partitioned tables and loaded data using both static and dynamic partition methods.
- Used IMPALA to analyze the data present in Hive tables.
- Created Views on Hive tables.
- Designed and developed REST web service for validating address.
- Collected data from Twitter using flume.
Environment:: AWS, CDH, HDFS, Hadoop, Pig, Hive, HBase, Sqoop, Flume, Map Reduce, Podium Data, Oozie, Java 6/7, Oracle 10g, YARN, UNIX Shell Scripting, SOAP, REST services, Oracle 10g, Agile Methodology, JIRA
- Installed and configured Hadoop Ecosystem components and Cloudera manager using CDH distribution.
- Developed multiple Map Reduce jobs in Java for complex business requirements including data cleansing and preprocessing.
- Developed Sqoop scripts to import/export data from Oracle to HDFS and into Hive tables.
- Worked on analyzing Hadoop clusters using Big Data Analytic tools including MapReduce, Pig and Hive.
- Involved in developing and writing Pig scripts and to store unstructured data into HDFS.
- Involved in creating tables in Hive and writing scripts and queries to load data into Hive tables from HDFS.
- Scripted complex Hive QL queries on Hive tables for analytical functions.
- Optimized the Hive tables utilizing improvement techniques like partitions and bucketing to give better execution Hive QL queries.
- Worked on Hive/Hbase vs RDBMS, imported data to Hive, created internal and external tables, partitions, indexes, views, queries and reports for BI data analysis.
- Developed Java custom record reader, partition and serialization techniques.
- Used different data formats (Text format and Avro format) while loading the data into HDFS.
- Created tables in HBase and loading data into HBase tables.
- Developed scripts to load data from HBase to Hive Meta store and perform MapReduce jobs.
- Created custom UDF’s in Pig and Hive.
- Created partitioned tables and loaded data using both static partition and dynamic partition methods.
- Installed Oozie workflow engine and scheduled it to run data/time dependent Hive and Pig jobs
- Designed and developed Dashboards for Analytical purposes using Tableau.
- Ran JSON scripts using Java with Maven repository.
- Used Jenkins for mapping the maven and the source tree.
- Analyzed the Hadoop log files using Pig scripts to oversee the errors.
Environment: HDFS, Map Reduce, Hive, Sqoop, Pig, HBase, Oozie, CDH distribution, Java, Eclipse, Shell Scripts, Windows, Linux.
Confidential, Portland, OR
- Performed requirement gathering, design, coding, testing, implementation and deployment.
- Worked on modeling of Dialog process, Business Processes and coding Business Objects, Query Mapper and JUnit files.
- Involved in the design and creation of Class diagrams, Sequence diagrams and Activity Diagrams using UML models
- Created the Business Objects methods using Java and integrating the activity diagrams.
- Involved in developing JSP pages using Struts custom tags, jQuery and Tiles Framework.
- Worked in web services using SOAP, WSDL.
- Wrote Query Mappers and MQ Experience in JUnit Test Cases.
- Managed software configuration using Clear Case and SVN.
- Design, develop and test features and enhancements.
- Performed error rate analysis of production issues and technical errors.
- Developed test environment for testing all the Web Service exposed as part of the core module and their integration with partner services in Integration test.
- Analyze user requirement document and develop test plan, which includes test objectives, test strategies, test environment, and test priorities.
- Assisted with the development of the call center's operations, quality and training processes. (Enterprise Contact Center Services)
- Responsible for performing end-to-end system testing of application writing JUnit test cases
- Perform Functional testing, Performance testing, Integration testing, Regression testing, Smoke testing and User Acceptance Testing (UAT).
- Used Jenkins for building and configuring the Java application using Maven.
- Involved in designing and programming the system, which includes development of Process Flow Diagram, Data Flow Diagram, Entity Relationship Diagram and Database Design.
- Designed front end components using JSF.
- Involved in login, transaction and reporting modules, and customized the report generation using Controllers, Testing and debugging the project for proper functionality and documenting modules developed.
- Implemented MVC architecture using Java, JSTL and Custom tag libraries.
- Involved in development of POJO classes and writing Hibernate query language (HQL) queries.
- Assisted in developing Java APIs, which communicates with the Java Beans.
- Implemented MVC architecture and DAO design pattern for maximum abstraction of the application and code reusability.
- Used XML, XSL for Data presentation, Report generation and customer feedback documents.
- Created Stored Procedures using SQL/PL-SQL for data modification.
- Extensively used Java Beans to automate the generation of Dynamic Reports and for customer transactions.
- Developed JUnit test cases for regression testing and integrated with ANT build.
- Implemented Logging framework using Log4J.
- Involved in code review and documentation review of technical artifacts.
- Maintenance of 400+ Production databases
- Ensure 100% database uptime
- Monitoring Database Startup and Shutdown
- Database Backup and Restoration
- Disaster Recovery
- Performance fine tuning like Dump and Load, Index Rebuild
- Incremental DF (Schema) Upload
- BI and AI monitoring
- Refreshing Test Databases
- Script Creation and Modifications
- Web speed broker creation
- Installation of Progress (Service pack & patch)
- Web speed and Apache on UNIX server
- Call management
- Creation of development and test databases
- Preparing monthly reports on databases
- Trained the Junior DB resources
- Interaction with developers and technical writers
- Scheduling Cron jobs
- Creation of development and test databases
- Monitoring server space utilization
Environment: Progress DBA, Progress4GL, MFG PRO, Shell Scripts, Windows, Linux, UNIX, Apache TomCat Server, IBM Aix Server.