Hadoop Developer Resume

SUMMARY

9+ years of diverse experience in the field of Information Technology, with approximately 4+ years of Hadoop experience development and designing, with few years in Java background. Also worked as Quality Analysts that includes experience in development, and Implementation of various applications.
Hands on experience in designing and implementing a complete end - to-end Hadoop Infrastructure using MapReduce, YARN, HDFS, PIG, HIVE, Sqoop, HBase, Kafka Flume, Apache Solr, Titan Graph DB, Java Spring Boot, Hue and Apache Phoenix.
Experience in Continuous integration/Deployment tools like Maven, Jenkins, Bamboo, UCD.
Experience in dealing with Apache Hadoop principal components like HDFS, Sqoop, MapReduce, Hive, Hbase, Spark, and Scala .
Experience in dealing with Apache Hadoop additional components like Ambari, Hue, Oozie, IBM BIGSQL and IBM Maestro,
Expertise in Creating Hive Internal/External Tables/Views using shared Meta store, writing scripts in HiveQL .
Experience in designing and implementing data warehouse systems - gathering requirements, data modeling, developing ETL, and building test cases.
Good Experience with Guidewire PolicyCenter API.
Expertise in Data load management, importing & exporting data using SQOOP .
Good knowledge on ETL procedures to transform the data in the intermediate tables according to the business rules and functionality requirements.
Excellent hands-on experience in Unit testing, Integration Testing and Functionality testing .
Worked on Job automation used shell scripting and Used bash shell scripting and Perl for writing scripts.
Used various file compression techniques in Hive like ORC, RC and Text file format depending on the requirement.
Good experience in methodologies like Waterfall, Agile and Scrum processes.
Excellent experience in some testing domains Smoke, Functional, Integration, GUI, Regression, System, Compatibility, Performance, Acceptance, Security, Stress, Black Box Testing etc.
Exposure in programming languages and scripts like Java, JavaScript, HTML, CSS.
Hands on experience in Xpath, Firebug, Fire Path.
Good experience on testing web services (SOAP, REST) using SOAP UI Tools used in Spring Batch jobs.
Testing Desktop and Web-Based applications with SQL/Oracle Databases. Expertise in using Tools HP Quality Center, Version One, ALM and JIRA

TECHNICAL SKILLS

Hadoop ecosystem: HDFS, MapReduce, Sqoop, Hive, Flume, Spark, Zookeeper, Oozie, Ambari, Hue, Yarn, Cloudera, Hbase, Solr, Janus Graph DB

IDE’s & Utilities: Eclipse, IntelliJ and JCreator, NetBeans

Automation Test Tools: Selenium IDE/RC/ Web Driver, Scala test, JUnits, TESTNG, Fire path, Firebug, Docker, Soap UI, GIT, UCD

Defect Tracking Tools: HPSM, Team Foundation Server (2012, 2015), Quality Center 9.0/10.0, MTM, JIRA

Programming/Scripting Languages: C/C++, Python, Java Script, shell scripting, Scala, spark shell, Spring Boot

Operating Systems: Windows, Linux/ Unix

RDBMS: SQL Server, MySQL, Oracle

Testing Methodologies: Agile, Waterfall, V-Model

PROFESSIONAL EXPERIENCE

Confidential

Hadoop Developer

Responsibilities:

Experience in building data flow ingestion framework (MDFE) from external sources (DB2/IBM MQ/Informatica/SQL servers) into Big data Ecosystems (Hive/HBase/Solr/Titan/Janus) using Big Data tools Apache Spark, Spring Boot, Spark SQL.
Experience in working with Titan/Janus graph DB, create Vertices/Edges/Search commands to fetch the data for corresponding claim/member details for respective person/group to provide real time services.
Work with the connecting partners (GSSP/SPI or API) to gather the requirements to build and support the required framework.
Creating HBase/NoSQL designs including Slowly Changing Dimension (SCD) with in the design to maintain history of updates.
Creating Hive designs including Slowly Changing Dimension (Type-2 and Type-4) with in the design to maintain history of updates.
Developing Sqoop scripts to extract large historical and incremental data from legacy applications (traditional warehouses such as Oracle, DB2, and SQLServer) into Big Data for Data and Analytics business.
Creating the low-level design for the injection, transforming and processing of the Daily business datasets using Sqoop, HIVE, Spark, IBM BIGSQL and IBM Maestro.
Automate and deploy the code into Production using CI/CD tools like BitBucket, Bamboo and urbancode.
Creation of various database objects and performance fine tuning of SQL tables and queries
Analyzing multiple SOR data which include disability/Absence claims, members, Treatments (ICD codes), Payments, etc and also involved in the design and architecture discussions of the project.
Implemented proof of concepts on Spark and Scala for various source data (XML/JSON) transformations and processed data using Spark SQL.
Creating Hive tables on top of the Hbase tables
Creating Hive tables, loading data and performing analysis using hive queries.
Creating Scala script to read CSV files, process and load the corresponding JSON file to HDFS
Implemented Partitioning, Dynamic Partitions and Bucketing in Hive for efficient data access.
Expert hands-on in data deduplication, data profiling for many production tables.
Worked with SPARK ecosystem using SCALA/HIVE Queries on different data formats like Text and CSV.
Strong knowledge and experience in Big Data, developing applications using Hadoop Ecosystem Analytics, Scala, and Apache Spark.
Developed Spark Scala program for transforming a semi and unstructured data into the structured target using Data Bricks and SparkSQL.
Developed Scripts and Batch Job to schedule various Hadoop Program.
Write Hive queries for data analysis to meet the business requirements.
Loading data using batch processing of data sources using Apache Spark.

Environment: Scala, Spark, Janus, Titan, Hbase, Solr, Spark SQL, HDFS, Hive, Linux, Intellij, Hue, Sqoop, Oracle, Shell Scripting, Yarn, Ambari and Hortonworks.

Confidential, TX

Hadoop Developer

Responsibilities:

Exploring with Spark improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames, Pair RDD's and SparkYARN.
Implemented test scripts to support test driven development and continuous integration.
Involved in scheduling Oozie workflow engine to run multiple Spark jobs
Developed the Spark data transformations programs based on Data Mapping.
Experience of GitHub and use of Git bash for code submission in GitHub repository.
Developed solutions to pre-process large sets of structured, with different file formats (Text file, Xml and JSON files).
Experienced with batch processing of data sources using Apache Spark.
Involve in Production Support and resolve any issues occurring asap to help the data flow to downsteam on time.
Weekly meetings with technical collaborators and active participation in code review sessions with senior and junior developers.
Deploy code using Urban code deploy and create a fully Automated Build and Deployment Platform and coordinating code builds, promotions and orchestrated deployments using Jenkins, Oozie and GIT.
Experienced on Docker to containerize the Application and all its dependencies by writing Docker file and Docker-Compose files. Designed, wrote and maintained systems (Python) for administering GIT. By using Jenkins, Bamboo as a full cycle continuous delivery tool involving package creation, distribution and deployment onto Hadoop servers that has shell scripts embedded.
Developing scripts for build, deployment, maintenance and related tasks in order to implement CI (Continuous Integration) system using Jenkins, Docker, Maven, Python and Bash.
Ability in development and execution of Bash, Shell Scripts, Groovy and Python Scripts. Efficient in working closely with teams to ensure high quality and timely delivery of builds and releases.
Automated various business processes and Created various monitoring scripts in Python to check for the Issues and close it automatically if manual interventions are not required.
Extensively worked on Bamboo, Docker for continuous integration and for End-to-End automation for all build and deployments.

Environment: Python, Scala, Spark, Spark SQL, HDFS, Hive, Linux, IntelliJ, Oozie, Hue, Oracle, Shell Scripting, Hortonworks, UCD (UrbanCodeDeploy), Docker, Jenkins, GitBash, Version One

Confidential

Hadoop Developer

Responsibilities:

Developed Spark code using Scala for processing of data.
Migrated legacy application to Big Data application using Sqoop/Hive/Spark/HBase.
Implemented proof of concepts on Spark and Scala for various source data (XML/JSON) transformations and processed data using Spark SQL.
Creating Hive tables, loading data and performing analysis using hive queries.
Implemented Partitioning, Dynamic Partitions and Bucketing in Hive for efficient data access.
Expert hands-on in data deduplication, data profiling for many production tables.
Worked with SPARK ecosystem using SCALA/HIVE Queries on different data formats like Text and CSV.
Implemented Spark best practices like partitions, caching and check pointing for faster data access.
Involved in HBase schema key designs, and Solr designing for contextual search features for a POC.
Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
Developed Scripts and Batch Job to schedule various Hadoop Program.
Write Hive queries for data analysis to meet the business requirements.
Exploring with Spark improving performance and optimization of the existing algorithms in Hadoop MapReduce using Spark Context, Spark-SQL, Data Frames, Pair RDD's and Spark YARN.
Developed solutions to pre-process large sets of structured, with different file formats (Text file, Avro data files, Sequence files, Xml and JSON files, ORC and Parquet).
Weekly meetings with technical collaborators and active participation in code review sessions with senior and junior developers.
Experienced with batch processing of data sources using Apache Spark.
Migrating tables from TEXT format to ORC and data induction and other customized file formats.
Designing and documenting the project use cases, writing test cases, leading offshore team, and interacting with client.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala.
Responsible for managing data from multiple sources, Load and transform large sets of structured, semi-structured data.

Environment: Scala, Spark, Spark Streaming, Spark SQL, HDFS, Hive, Pig, Linux, Eclipse, Oozie, Hue, Apache Kafka, Sqoop, Oracle, Shell Scripting, Yarn, Ambari and Hortonworks.

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship