Big Data / Hadoop Developer Resume
Atlanta-gA
SUMMARY:
- Around 7+ years of experience as Software Engineer with major focus on
- Big Data technologies - Hadoop Ecosystem,HDFS, Map-Reduce, Hbase, HIVE, Sqoop, Kafka,Oozie,Spark,Teradata,Java,Spring,Microservices
- Involved in all the phases of Software Development Life Cycle (SDLC): Requirements gathering, analysis, design, development, testing, production and post-production support.
- Experience in writing Map Reduce programs for analyzing Big Data with different file formats like structured and unstructured data.
- Developed Map Reduce jobs based on the use cases using Java, Map Reduce, Pig and Hive
- Experience in loading data using Hive and writing scripts for data transformations using Hive and Pig .
- Knowledge and working experience in developing Apache Spark programs using Scala.
- Good Knowledge in Spark SQL queries to load tables into HDFS to run select queries on top.
- Hands-on experience with message broker such as Apache Kafka.
- Developed UDF functions and implemented it in HIVE Queries
- Hands on experience on NoSQL databases such as Cassandra and MongoDB
- Developed PIG Latin scripts for handling business transformations
- Implemented Sqoop for large dataset transfer between Hadoop and RDBMs.
- Experience with Oozie Workflow Engine in running workflow jobs with actions that run Hadoop Map/Reduce and Pig jobs.
- Knowledge on installation and administration of multi-node virtualized clusters using Cloudera Hadoop and Apache Hadoop.
- Experience setting up instances behind Elastic Load Balancer in AWS for high availability.
- Experience in working with CI/CD pipeline using tools like Jenkins and Chef.
- Worked on Jenkins for continuous integration and for End to End automation for a poll the build and deployments by managing different plugins Maven and Ant.
- Wrote cookbooks in chef to automate the system operations.
- Hands-on experience in SCM tools like Gitand SVN for merging and branching.
- Knowledge in working with continuous deployment tools like Chef.
- Good understanding of Open shift platform in managing Docker containers using Docker swarm, Kubernetes Clusters.
- Excellent communications skills, configuration skills and technical documentation skills.
- Ability to work closely with teams to ensure high quality timely delivery of builds & release
- Excellent relationship management skills & ability to conceive efficient solutions utilizing technology. Industrious individual who thrives on a challenge, working effectively with all levels of management.
TECHNICAL SKILLS:
Big data Technologies: Hadoop, HDFS, Pig, Hive, MapReduce, Cassandra, Kafka, Spark, Teradata.
Clou d Plat fo rm: A W S, and O pen stack.
C onfigur at ion M anage me nt: Chef, Puppet, Vagrant, M av e n, An sib le, Dockers, Gradle, Splunk, O PS Work .
Continuous Integration Tools: NPM, Grunt, Gulp,Jenkins,JIRA.
Web Serv ers: Apache, Tom cat, W eb Sphere, Nix, JBO SS, WebSphere.
Dat abase: Or acle, DB2,My SQ L,M ong o DB,SQ LServ e r,M S SQL, Kubernetes
Scripting Languages: JavaScript, Python, Shell, C, HT M L, Bash PHP.
Bu ild To ols: ANT, M AVEN, make file, Hu dson,Jenki ns, BAM BOO, Code Deploy .
Version Co n t ro l T oo ls: Sub v ersion (SVN), Clear case, G IT, G IT Hub, Perf orce, Cassandra, Code Comm it.
SDL C: Agile, S cr um .
Web Techno log ies.: HT ML, CSS, Jav a Scr ipt, j Q uery, Boots tr ap, XM L, JSO N, XSD, X SL, XPATH .
Op erati n g S y ste ms: Red hat, Linux and W I NDO W S, Cent O S.
PROFESSIONAL EXPERIENCE:
Confidential, Atlanta-GA
Big data / Hadoop Developer
Responsibility:
- Responsible to develop the applications on Data Lake as per the client requirements and exposing that data to client.
- Developing the code to move the data from one zone to another zone in Data fabric platform.
- Create the applications like Claims Sweep WGS for transforming the data as per the client requirements using Spark, hive and Python.
- Developing automation scripts to do validation like Record count, schema Check etc. and load the data into corresponding partitions.
- Developing the programs to validate the data after ingesting the data into Data Lake using UNIX.
- Developing the scripts to generate reconciliation reports using Python
- Involved in moving data from different source systems like Oracle, SQL and DB2 etc. to Data Lake.
- Identifying the layout for COBOL copybooks and clean up the copybooks and ingest the data as per the layout.
- Ingest the data from Oracle Database thru Oracle Golden Gate to Hadoop Data Lake with the help of Kafka.
- Responsible to provide the technical solutions for the team facing issues.
- Responsible to guide the team when they have issues.
- Responsible to provide design and architecture to the team to develop applications.
- Responsible to review the code and make the code as per the Anthem Standards.
- Creating data model for the data to be ingested for each table.
- Identifying the appropriate file formats for the tables to retrieve the data faster.
- Deciding the column data types properly so that we don’t lose the data or miss the data.
- Responsible to propose testing strategy and develop the code for the audit framework.
- Gathering all the issues from all development teams facing in the testing.
- For all the common issues, developing the audit framework so that quality of data will ingested even though testers overlooked the issues.
- Identifying the user stories for the requirements/EPICS.
- Responsible to identify the user stories or work items for the initiatives in PI planning.
- Responsible to develop Masking Algorithms to mask PHI columns to expose the data to offshore so that they cannot see actual data.
- Creating the views by masking PHI Columns for the table, so that data in the view for the PHI columns cannot be seen by off shore team.
- Implemented Daily Cron jobs that automate parallel tasks of loading the data into HDFS and pre-processing with CTRL-M jobs.
- Collecting and aggregated large amounts of data from different sources using Stream sets (Producer) for ingestion to Kafka and processing the real-time streaming data and stored it in Hbase and HDFS
Confidential - CaptialOne,Plano-TX
Big Data/Hadoop java Developer
Responsibility:
- Involved in importing the real time data to Hadoop using Kafka and implemented the Oozie job for daily data.
- Loaded the data from Teradata to HDFS using Teradata Hadoop connectors.
- Import the data from different sources like HDFS/HBase into Spark RDD
- Developed Spark scripts by using Python shell commands as per the requirement
- Issued SQL queries via Impala to process the data stored in HDFS and HBase.
- Used the Spark - Cassandra Connector to load data to and from Cassandra.
- Used Restful Web Services API to connect with the MapRtable. The connection to Database was developed through restful web services API.
- Involved in developing Hive DDLs to create, alter and drop Hive tables and storm, & Kafka.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Experience in data migration from RDBMS to Cassandra. Created data-models for customer data using the Cassandra Query Language.
- Responsible for building scalable distributed data solutions using Hadoop cluster environment with Horton works distribution
- Experienced in developing Spark scripts for data analysis in both python and Scala. Designed and developed various modules of the application with J2EE design architecture.
- Implemented modules using Core Java APIs, Java collection and integrating the modules. Experienced in transferring data from different data sources into HDFS systems using Kafka producers, consumers and Kafka brokers
- Installed Kibana using salt scripts and build custom dashboards that can visualize aspects of important data stored by Elastic search.
- Used File System Check (FSCK) to check the health of files in HDFS and used Sqoop to import data from SQL server to Cassandra
- Streaming the transactional data to Cassandra using Spark Streaming/Kafka
- Implemented a distributed messaging queue to integrate with Cassandra using Apache Kafka and Zookeeper.
- Written ConfigMap and Daemon set files to install File beats on Kubernetes PODS to send the log files to Log stash or Elastic search to monitor the different type of logs in Kibana.
- Created Database on Influx DB also worked on Interface, created for Kafka also checked the measurements on Databases.
- Installed Kafka manager for consumer lags and for monitoring Kafka Metrics also this has been used for adding topics, Partitions etc.
- Successfully Generated consumer group lags from Kafka using their API.
- Ran Log aggregations, website Activity tracking and commit log for distributed system using Apache Kafka
- Involved in creating Hive tables, and loading and analyzing data using hive queries.
- Developed multiple MapReduce jobs in java for data cleaning and pre-processing. Loading data from different source (database & files) into Hive using Talend tool. Created and maintained Technical documentation for launching Hadoop Clusters and for executing Hive queries and Pig Scripts.
- Used Oozie and Zookeeper operational services for coordinating cluster and scheduling workflows.
- Implemented Flume, Spark, and Spark Streaming framework for real time data processing.
Tools and Technology: Hadoop, HDFS, Pig, Hive, MapReduce, Agile, Cassandra, Kafka, Storm, AWS, YARN, Spark, ETL, Teradata, NoSQL, Oozie, Java, Cassandra, AWS, Talend, LINUX,Kibana,HBase
Confidential - Fusion Storm,CA
AWS Developer
Responsibility:
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Spark SQL and Scala extracted large datasets from Cassandra and Oracle servers into HDFS and vice versa using Sqoop.
- Extending Hive and Pig core functionality by writing custom UDFs .
- Worked on migrating Map Reduce programs into Spark transformations using Spark and Scala.
- Using Scala developed spark code and Spark-SQL/Streaming for faster processing and testing of data.
- Created Hive tables to store the processed results in a tabular format.
- Implemented business logic based on state in Hive using Generic UDF's.
- Analyzed HBase data in Hive by creating external partitioned and bucketed tables.
- Used Pig in three distinct workloads like pipelines, iterative processing and research.
- Involved in moving all log files generated from various sources to HDFS for further processing through Kafka and Flume.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Extensively used PIG to communicate with Hive using HCatalog and HBASE using Handlers.
- Implemented various MapReduce Jobs in custom environments and updating them to Hbase tables by generating hive queries.
- Explore with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context , Spark-SQL , Data Frame, Pair RDD's, Spark YARN.
- Import the data from different sources like HDFS/HBase into Spark RDD.
- Experience over Kafka and Storm are used for real time analytics and AML, which used for data analytics
Tools and Technology: A maz on W eb Servi ces, Chef, Vag ran t, Scr um, Su b version (SVN ), ANT, U Deploy, DB2, J IRA,, Java Con flu enc e, Sh ell Scri p ts, W eb Sph ere
Confidential - S.S.Techno Pvt Ltd, Hyderabad
Java Developer
Responsibility:
- Involved in full life-cycle of the project from Design, Analysis, logical and physical architecture modeling, development, Implementation, testing.
- Developed Map Reduce programs to parse the raw data and store the refined data in tables.
- Designed and Modified Database tables and used HBASE Queries to insert and fetch data from tables.
- Designing & Creating ETL Jobs through Talend to load huge volumes.
- Created RDD's and applied data filters in Spark and created Cassandra tables and Hive tables for user access.
- Responsible for managing data coming from various sources.
- Involved in generating Analytics for brand pages.
- Experienced in working with Apache Spark
- Involved in the Object Oriented Analysis and Design using UML including development of class diagrams, Use Case Diagrams, Sequence diagrams, and State Diagrams
- Developed the application using J2EE architecture
- Developed the view pages in JSP, using CSS and validations using Servlets
- Programming for various backend services using Java JDBC for accessing Oracle database establishing and reusing database connections and write stored procedure
- Used the Struts validation, Struts Custom tags and Tiles Framework in the presentation layer
- Responsible for application build and releases using ANT as an application building tool and deploying the applications on WebLogic
- Involved in the end to end coding, testing of the system including writing unit test cases
- Maintaining the code repository using VSS and ClearCase for keeping codebase in sync with other phases of projects running simultaneously
Tools and Technology: BEA WebLogic Server, IBM MQSeries, Eclipse, Java, JSP, Servlets, Struts 1.2, JDBC, ANT, HTML, CSS, Oracle 8i, TOAD, Java Script, UML
