- 9+ years of experience in Application development/Architecture and Data Analytics with specialization in Java and Big Data Technologies and expertise in Java, Hadoop Map Reduce, Pig, Hive, Oozie, Sqoop, Flume, Zookeeper,Impala and NoSQL Databases .
- Experienced in Data modeling using ER diagram, Dimensional data modeling, Conceptual/Logical/Physical Modeling using 3NormalForm (3NF), Star Schema modeling, Snowflake modeling using tools like ER/Studio, and CA Erwin for both forward and reverse engineering.
- Excellent experience in Amazon, Cloudera and Hortonworks Hadoop distribution and maintaining and optimized AWS infrastructure (EMR EC2, S3, EBS) and excellent experienced on Hadoop ecosystem, In - depth understanding of Map Reduce and the Hadoop Infrastructure.
- Expertise in developing Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
- Experienced with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Excellent knowledge on Hadoop Architecture and ecosystems such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, YARN and Map Reduce programming paradigm.
- Experienced working with Hadoop Big Data technologies(hdfs and Mapreduce programs), Hadoop echo systems (Hbase, Hive, pig) and NoSQL database MongoDB
- Experienced on usage of NoSQL database column-oriented HBase.
- Extensive experienced in working with semi/unstructured data by implementing complex map reduce programs using design patterns.
- Experienced on major components in Hadoop Ecosystem including Hive, Sqoop, Flume &knowledge ofMapReduce/HDFS Framework.
- Experience in BI/DW solution (ETL, OLAP, Data mart), Informatica, BI Reporting tool like Tableau and Qlikview.
- Experienced in working with MapReduce Design patterns to solve complex MapReduce programs and Excellent w orking Knowledge on Sqoop and Flume for Data Processing
- Excellent Knowledge in Talend Big data integration for business demands to work towards Hadoop and NoSQL
- Expertise in loading the data from the different Data sources like (Teradata and DB2) into HDFS using sqoop and load into partitioned Hive tables
- Experienced on Hadoop cluster maintenance including data and metadata backups, file system checks, commissioning and decommissioning nodes and upgrades.
- Extensive experience writing custom Map Reduce programs for data processing and UDFs for both Hive and Pig in Java and h ands-on programming experience in various technologies like JAVA, J2EE, HTML, XML
- Strong experience in analyzing large amounts of data sets writing Pigscripts and Hive queries and e xtensive experienced in working with structured data using Hive QL, join operations, writing custom UDF's and experienced in optimizing Hive Queries.
- Expertise in transforming business requirements into analytical models, designing algorithms, building models, developing Data Mining and reporting solutions that scales across massive volume of structured and unstructured d ata.
- Experienced in importing and exporting data using Sqoop from HDFS to Relational Database and e xpertise in job workflow scheduling and monitoring tools like Oozie.
- Experienced in Apache Flume for collecting, aggregating and moving huge chunks of data from various sources such as webserver, telnet sources etc and strong experience in architecting batch style large scale distributed computing applications using tools like Flume, Map reduce, Hive etc.
- Strong Experience in working with Database Teradata and proficiency in writing complex SQL, PL/SQL for creating tables, views, indexes, stored procedures and functions.
- Experience using various Hadoop Distributions (Cloudera, Hortonworks, MapRetc) to fully implement and leverage new Hadoop features and worked on custom Pig Loaders and Storage classes to work with a variety of data formats such as JSON, Compressed CSV, etc.,
- Experienced in working with different scripting technologies like Python, UNIX shell scripts and strong experienced in working with UNIX/LINUX environments, writing shell scripts.
- Experience working on Branching, Tagging, Release Activities on Version Control Tools: SVN, GitHub and experience in creating S3 buckets and managed policies for S3 buckets and utilized S3 Buckets and Glacier for storage, backup and archived in AWS.
Hadoop/Big Data Technologies: HDFS, Map Reduce, Sqoop, Flume, Pig, Hive, Oozie, impala, Zookeeper and Cloudera Manager, MongoDB, NO SQL Database HBase, Cassandra
Monitoring and Reporting: Tableau, Custom shell scripts, Hadoop Distribution Horton Works, Cloudera, MapR
Build Tools: Maven, SQL Developer
Cloud Technologies: AWS S3, AWS EMR, AWS Glue, AWS EC2, Azure.
Programming & Scripting: JAVA, J2EE, HTML, Java script, JQuery, PL/SQL, C, SQL, Shell Scripting, Python.
Databases: Oracle, MY SQL, MS SQL server, Teradata
Operating Systems: Linux, Unix, Mac OS-X, Windows 8, Windows 7, Windows Server …
Sr. Big Data Developer/Engineer
Confidential, Chicago IL
- Implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies such as Hadoop, Map Reduce Frameworks, HBase, and Hive.
- Managed and lead the development effort with the help of a diverse internal and overseas group and design/ architected and implemented complex projects dealing with the considerable data size (GB/ PB) and with high complexity.
- Designed and deployed full SDLC of AWS Hadoop cluster based on client's business need and involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
- Design AWS architecture, Cloud migration, AWS EMR, Dynamo DB, Redshift and event processing using lambda function
- Performed data profiling and transformation on the raw data using Pig, Python, and Java and developed predictive analytic using Apache Spark Scala APIs.
- Implement enterprise grade platform (mark logic) for ETL from mainframe to NOSQL (Cassandra) and responsible for importing log files from various sources into HDFS using Flume
- Analyzed data using HiveQL to generate payer by reports for transmission to payer's form payment summaries.
- Imported millions of structured data from relational databases using Sqoop import to process using Spark and stored the data into HDFS in CSV format.
- Used Data Frame API in Scala for converting the distributed collection of data organized into named columns.
- Designed and Developed Real time Stream processing Application using Spark, Kafka, Scala and Hive to perform Streaming ETL and apply Machine Learning.
- Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Exploring DAG's, their dependencies and logs using AirFlow pipelines for automation and use Apache Airflow to schedule and run the airflow dags to execute code.
- Involved in working of big data analysis using Pig and User defined functions (UDF) and created Hive External tables and loaded the data into tables and query data using HQL.
- Implemented Spark GraphX application to analyze guest behavior for data science segments.
- Enhancements to traditional data warehouse based on STAR schema, update data models, perform Data Analytics and Reporting using Tableau.
- Involved in migration of data from existing RDBMS (oracle and SQL server) to Hadoop using Sqoop for processing data.
- Developed Shell, Perl and Python scripts to automate and provide Control flow to Pig scripts.
- Developed prototype for Big Data analysis using Spark, RDD, Data Frames and Hadoop eco system with .csv, JSON, parquet and HDFS files.
- Developed HiveQL scripts for performing transformation logic and also loading the data from staging zone to landing zone and Semantic zone.
- Maintain and work with our data pipeline that transfers and processes several terabytes of data using Spark, Scala, Python, Apache Kafka, Pig/ Hive & Impala
- Involved in creating Oozie workflow and Coordinator jobs for Hive jobs to kick off the jobs on time for data availability and worked on Oozie scheduler to automate the pipeline workflow and orchestrate the Sqoop, hive and pig jobs that extract the data on a timely manner.
- Exported the generated results to Tableau for testing by connecting to the corresponding Hive tables using Hive ODBC connector.
- Involved in scheduling Airflow workflow engine to run multiple Hive and pig jobs using python.
- Exported data from HDFS to RDBMS via Sqoop for Business Intelligence, visualization and user report generation.
- Responsible for Design EDW Application Solutions & deployment, optimizing processes, definition and implementation of best practice
Sr. Big Data Developer/Engineer
Confidential, Charlotte, NC
- Responsible for installation and configuration of Hive, Pig, Hbase and Sqoop on the Hadoop cluster and created hive tables to store the processed results in a tabular format.
- Configured Spark Streaming to receive real time data from the Apache Kafka and store the stream data to HDFS using Scala.
- Built analytical data pipelines to port data in and out of Hadoop/HDFS from structured and unstructured sources and designed and implemented system architecture for Amazon EC2 based cloud-hosted solution for the client.
- Processed data into HDFS by developing solutions and analyzed the data using Map Reduce, PIG, and Hive to produce summary results from Hadoop to downstream systems.
- Build servers using AWS: Importing volumes, launching EC2, creating security groups, auto-scaling, load balancers, Route 53, SES and SNS in the defined virtual private connection.
- Written Map Reduce code to process and parsing the data from various sources and storing parsed data into HBase and Hive using HBase-Hive Integration.
- Designed and provisioned the platform architecture to execute Hadoop and Machine Learning use cases under Cloud infrastructure, AWS, EMR, and S3.
- Developed Scala scripts, UDFFs using PySpark, Data frames/SQL and RDD/MapReduce in Spark 1.3 for Data Aggregation, queries and writing data back into OLTP system directly or through Sqoop.
- Worked on writing Spark scripts for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context/Session, SparkSQL, Data Frame, Pair RDD's.
- Streamed AWS log group into Lambda function to create service now incident.
- Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts and created Managed tables and External tables in Hive and loaded data from HDFS.
- Developed Spark code by using Scala and Spark-SQL for faster processing and testing and performed complex HiveQL queries on Hive tables.
- Scheduled several time based Oozie workflow by developing Python scripts.
- Developed Pig Latin scripts using operators such as LOAD, STORE, DUMP, FILTER, DISTINCT, FOREACH, GENERATE, GROUP, COGROUP, ORDER, LIMIT, UNION, SPLIT to extract data from data files to load into HDFS.
- Exporting the data using Sqoop to RDBMS servers and processed that data for ETL operations.
- Worked on S3 buckets on AWS to store Cloud Formation Templates and worked on AWS to create EC2 instances.
- Designing ETL Data Pipeline flow to ingest the data from RDBMS source to Hadoop using shell script, sqoop, package and MySQL.
- Performed data requirements analysis, data modeling (using Erwin) and established data architecture standards and performed dimensional data modeling using Erwin to support data warehouse design and ETL development.
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.
- Implementing Hadoop with the AWS EC2 system using a few instances in gathering and analyzing data log files.
- Involved in Spark and Spark Streaming creating RDD's, applying operations -Transformation and Actions and created partitioned tables and loaded data using both static partition and dynamic partition method.
- Developed custom Apache Spark programs in Scala to analyze and transform unstructured data.
- Using Kafka on publish-subscribe messaging as a distributed commit log, have experienced in its fast, scalable and durability.
- Completed data extraction, aggregation and analysis in HDFS by using PySpark and store the data needed to Hive and t est Driven Development (TDD) process and extensive experience with Agile and SCRUM programming methodology.
- Implemented POC to migrate Map Reduce jobs into Spark RDD transformations using SCALA and scheduled map reduce jobs in production environment using Oozie scheduler.
- Involved in Cluster maintenance, Cluster Monitoring and Troubleshooting, Manage and review data backups and log files.
- Designed and implemented map reduce jobs to support distributed processing using java, Hive and Apache Pig
- Analyzing Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase and Sqoop.
- Research, evaluate and utilize new technologies/tools/frameworks around Hadoop ecosystem and Improved the Performance by tuning of HIVE and map reduce.
Confidential, Minneapolis, MN
- Responsible for Managing, Analyzing and Transforming petabyte's of data and also quick validation check on FTP file arrival from S3 Bucket to HDFS.
- Responsible for analyzing large data sets and derive customer usage patterns by developing new MapReduce programs.
- Involved in creation of Hive tables and loading data incrementally into the tables using Dynamic Partitioning and Worked on Avro Files, JSON Records.
- Involved in using Pig for data cleansing and developed Pig Latin scripts to extract the data from web server output files to load into HDFS.
- Worked on distributed frameworks such as Apache Spark and Presto in Amazon EMR, Redshift and interact with data in other AWS data stores such as Amazon 53 and Amazon DynamoDB.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Developed customized Hive UDFs and UDAFs in Java, JDBC connectivity with hive development and execution of Pig scripts and PigUDF's.
- Worked on Hive by creating external and internal tables, loading it with data and writing Hive queries.
- Offline Analysis was performed on HDFS and sent the results to MongoDB databases to update the information on the existing table, From Hadoop to MongoDB move was done using MapReduce, Hive/ Pigscripts by connecting with Mongo-Hadoop connectors.
- Involved in development and usage of UDTF's and UDAF's for decoding Log Record Fields and Conversion's, Generating Minute Buckets for the specified Time Interval's and JSON Field Extractor.
- Responsible for Debug, Optimization of Hive Scripts and also implementing duplication Logic in Hive using a Rank Key Function (UDF) and developed Pig and Hive UDF's to analyze the complex data to find specific user behavior.
- Experienced in writing Hive Validation Scripts which are used in validation framework (for daily analysis through graphs and presented to business users).
- Developed workflow in Oozie to automate the tasks of loading data into HDFS and pre-processing with Pig and Hive.
- Involved for Cassandra Database Schema design and using BULK LOAD Utility data pushed to Cassandra databases.
- Responsible for creating Dashboards on Tableau Server and generated reports for hive tables in different scenarios using Tableau
- Responsible for Scheduling using Active Batch jobs and Cron jobs and involved in Jar builds that can be triggered by commits to Github using Jenkins.
- Exploring new tools for data tagging like Tealium (POC Report)
- Actively updated the upper management with daily updates on the progress of project that include the classification levels that were achieved on the data.
Environment: Hadoop, Map Reduce, HDFS, Pig, Hive, HBase, Zookeeper, Oozie, Impala, Cassandra, Java (jdk1.6), Cloudera, Oracle 11g/10g, Windows NT, UNIX Shell Scripting, Tableau, Tealium, AWS, S3, SQL, Python.
Sr. Java Developer
- Developed detail design document based on design discussions and involved in designing the database tables and java classes used in the application.
- Involved in development, Unit testing and system integration testing of the travel network builder side of application.
- Involved in design, development and building the travel network file system to be stored in NAS drives.
- Setup Linux environment for to interact with route smart library (.so) file and NAS drive file operations using JNI.
- Implemented and configure Hudson as Continuous Integration server and Sonar for maintaining code and remove redundant code.
- Extensively worked with Hibernate Query Language (HQL) to store and retrieve the data from Oracle database.
- Developed Java Web Applications using JSP and Servlets, Struts, Hibernate, spring, Rest Web Services, SOAP.
- Provide support in all phases of Software development life cycle (SDLC), quality management systems and project life cycle processes. Utilizing Database Such as MYSQL, Following HTTP and WSDL Standards to Design the REST/ SOAP Based Web API’S using XML, JSON, HTML, and DOM Technologies.
- Involved in Migrating existing distributed JSP framework to Struts Framework, designed and involved in research of Struts MVC framework
- Worked with Route-smart C++ code to interact with Java application using SWIG and Java Native interfaces.
- Developed the user interface for requesting a travel network build using JSP and Servlets.
- Build business logic to users can specify which version of the travel network files to be used for the solve process.
- Used Spring Data Access Object to access the data with data source and build an independent property sub-system to ensure that the request always picks the latest set of properties.
- Implemented thread Monitor system to monitor threads. Used JUnit to do the Unit testing around the development modules.
- Wrote SQL queries and procedures for the application, interacted with third party ESRI functions to retrieve map data and building and Deployment of JAR, WAR, EAR files on dev, QA servers.
- Bug fixing (Log 4j for logging) and testing support after the development and prepared requirements and research to move the map data using Hadoop framework for future usage.
- Involved in the analysis & design of the application using Rational Rose and developed the various action classes to handle the requests and responses.
- Used Hibernate framework to persist the employee work hours to the database.
- Developed classes and interface with underlying web services layer and prepared documentation and participated in preparing user's manual for the application.
- Prepared Use Cases, Business Process Models and Data flow diagrams, User Interface models.
- Back end server side coding and development using Java data structure as a Collections including Set, List, Map, Exception Handling, Vaadin, Spring with dependency injection, Struts Framework, Hibernate, Servlets, Action, Action Forms &Java beans, etc.
- Responsible to enhance the UI using HTML, Java Script, XML, JSP, CSS as per the requirements and providing the client side using JQuery validations.
- Involved in write application level code to interact with APIs, Web Services using AJAX, JSON and XML.
- Wrote lots of JSP's for maintains and enhancements of the application. Worked on Front End using Servlets, JSP and also backend using Hibernate.
- Gathered & analyzed requirements for EAuto, designed process flow diagrams and d efined business processes related to the project and provided technical direction to development workgroup.
- Analyzed the legacy and the Financial Data Warehouse and participated in Data base design sessions, Database normalization meetings.
- Managed Change Request Management and Defect Management and managed UAT testing and developed test strategies, test plans, reviewed QA test plans for appropriate test coverage.
- Involved in Developing JSP's, action classes, form beans, response beans, EJB's and extensively used XML to code configuration files.
- Developed PL/SQL stored procedures, triggers and performed functional, integration, system and validation testing.