Hadoop Developer Resume
SUMMARY
- 9+ years of diverse experience in the field of Information Technology, with approximately 4+ years of Hadoop experience development and designing, with few years in Java background. Also worked as Quality Analysts that includes experience in development, and Implementation of various applications.
- Hands on experience in designing and implementing a complete end - to-end Hadoop Infrastructure using MapReduce, YARN, HDFS, PIG, HIVE, Sqoop, HBase, Kafka Flume, Apache Solr, Titan Graph DB, Java Spring Boot, Hue and Apache Phoenix.
- Experience in Continuous integration/Deployment tools like Maven, Jenkins, Bamboo, UCD.
- Experience in dealing with Apache Hadoop principal components like HDFS, Sqoop, MapReduce, Hive, Hbase, Spark, and Scala .
- Experience in dealing with Apache Hadoop additional components like Ambari, Hue, Oozie, IBM BIGSQL and IBM Maestro,
- Expertise in Creating Hive Internal/External Tables/Views using shared Meta store, writing scripts in HiveQL .
- Experience in designing and implementing data warehouse systems - gathering requirements, data modeling, developing ETL, and building test cases.
- Good Experience with Guidewire PolicyCenter API.
- Expertise in Data load management, importing & exporting data using SQOOP .
- Good knowledge on ETL procedures to transform the data in the intermediate tables according to the business rules and functionality requirements.
- Excellent hands-on experience in Unit testing, Integration Testing and Functionality testing .
- Worked on Job automation used shell scripting and Used bash shell scripting and Perl for writing scripts.
- Used various file compression techniques in Hive like ORC, RC and Text file format depending on the requirement.
- Good experience in methodologies like Waterfall, Agile and Scrum processes.
- Excellent experience in some testing domains Smoke, Functional, Integration, GUI, Regression, System, Compatibility, Performance, Acceptance, Security, Stress, Black Box Testing etc.
- Exposure in programming languages and scripts like Java, JavaScript, HTML, CSS.
- Hands on experience in Xpath, Firebug, Fire Path.
- Good experience on testing web services (SOAP, REST) using SOAP UI Tools used in Spring Batch jobs.
- Testing Desktop and Web-Based applications with SQL/Oracle Databases. Expertise in using Tools HP Quality Center, Version One, ALM and JIRA
TECHNICAL SKILLS
Hadoop ecosystem: HDFS, MapReduce, Sqoop, Hive, Flume, Spark, Zookeeper, Oozie, Ambari, Hue, Yarn, Cloudera, Hbase, Solr, Janus Graph DB
IDE’s & Utilities: Eclipse, IntelliJ and JCreator, NetBeans
Automation Test Tools: Selenium IDE/RC/ Web Driver, Scala test, JUnits, TESTNG, Fire path, Firebug, Docker, Soap UI, GIT, UCD
Defect Tracking Tools: HPSM, Team Foundation Server (2012, 2015), Quality Center 9.0/10.0, MTM, JIRA
Programming/Scripting Languages: C/C++, Python, Java Script, shell scripting, Scala, spark shell, Spring Boot
Operating Systems: Windows, Linux/ Unix
RDBMS: SQL Server, MySQL, Oracle
Testing Methodologies: Agile, Waterfall, V-Model
PROFESSIONAL EXPERIENCE
Confidential
Hadoop Developer
Responsibilities:
- Experience in building data flow ingestion framework (MDFE) from external sources (DB2/IBM MQ/Informatica/SQL servers) into Big data Ecosystems (Hive/HBase/Solr/Titan/Janus) using Big Data tools Apache Spark, Spring Boot, Spark SQL.
- Experience in working with Titan/Janus graph DB, create Vertices/Edges/Search commands to fetch the data for corresponding claim/member details for respective person/group to provide real time services.
- Work with the connecting partners (GSSP/SPI or API) to gather the requirements to build and support the required framework.
- Creating HBase/NoSQL designs including Slowly Changing Dimension (SCD) with in the design to maintain history of updates.
- Creating Hive designs including Slowly Changing Dimension (Type-2 and Type-4) with in the design to maintain history of updates.
- Developing Sqoop scripts to extract large historical and incremental data from legacy applications (traditional warehouses such as Oracle, DB2, and SQLServer) into Big Data for Data and Analytics business.
- Creating the low-level design for the injection, transforming and processing of the Daily business datasets using Sqoop, HIVE, Spark, IBM BIGSQL and IBM Maestro.
- Automate and deploy the code into Production using CI/CD tools like BitBucket, Bamboo and urbancode.
- Creation of various database objects and performance fine tuning of SQL tables and queries
- Analyzing multiple SOR data which include disability/Absence claims, members, Treatments (ICD codes), Payments, etc and also involved in the design and architecture discussions of the project.
- Implemented proof of concepts on Spark and Scala for various source data (XML/JSON) transformations and processed data using Spark SQL.
- Creating Hive tables on top of the Hbase tables
- Creating Hive tables, loading data and performing analysis using hive queries.
- Creating Scala script to read CSV files, process and load the corresponding JSON file to HDFS
- Implemented Partitioning, Dynamic Partitions and Bucketing in Hive for efficient data access.
- Expert hands-on in data deduplication, data profiling for many production tables.
- Worked with SPARK ecosystem using SCALA/HIVE Queries on different data formats like Text and CSV.
- Strong knowledge and experience in Big Data, developing applications using Hadoop Ecosystem Analytics, Scala, and Apache Spark.
- Developed Spark Scala program for transforming a semi and unstructured data into the structured target using Data Bricks and SparkSQL.
- Developed Scripts and Batch Job to schedule various Hadoop Program.
- Write Hive queries for data analysis to meet the business requirements.
- Loading data using batch processing of data sources using Apache Spark.
Environment: Scala, Spark, Janus, Titan, Hbase, Solr, Spark SQL, HDFS, Hive, Linux, Intellij, Hue, Sqoop, Oracle, Shell Scripting, Yarn, Ambari and Hortonworks.
Confidential, TX
Hadoop Developer
Responsibilities:
- Exploring with Spark improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames, Pair RDD's and SparkYARN.
- Implemented test scripts to support test driven development and continuous integration.
- Involved in scheduling Oozie workflow engine to run multiple Spark jobs
- Developed the Spark data transformations programs based on Data Mapping.
- Experience of GitHub and use of Git bash for code submission in GitHub repository.
- Developed solutions to pre-process large sets of structured, with different file formats (Text file, Xml and JSON files).
- Experienced with batch processing of data sources using Apache Spark.
- Involve in Production Support and resolve any issues occurring asap to help the data flow to downsteam on time.
- Weekly meetings with technical collaborators and active participation in code review sessions with senior and junior developers.
- Deploy code using Urban code deploy and create a fully Automated Build and Deployment Platform and coordinating code builds, promotions and orchestrated deployments using Jenkins, Oozie and GIT.
- Experienced on Docker to containerize the Application and all its dependencies by writing Docker file and Docker-Compose files. Designed, wrote and maintained systems (Python) for administering GIT. By using Jenkins, Bamboo as a full cycle continuous delivery tool involving package creation, distribution and deployment onto Hadoop servers that has shell scripts embedded.
- Developing scripts for build, deployment, maintenance and related tasks in order to implement CI (Continuous Integration) system using Jenkins, Docker, Maven, Python and Bash.
- Ability in development and execution of Bash, Shell Scripts, Groovy and Python Scripts. Efficient in working closely with teams to ensure high quality and timely delivery of builds and releases.
- Automated various business processes and Created various monitoring scripts in Python to check for the Issues and close it automatically if manual interventions are not required.
- Extensively worked on Bamboo, Docker for continuous integration and for End-to-End automation for all build and deployments.
Environment: Python, Scala, Spark, Spark SQL, HDFS, Hive, Linux, IntelliJ, Oozie, Hue, Oracle, Shell Scripting, Hortonworks, UCD (UrbanCodeDeploy), Docker, Jenkins, GitBash, Version One
Confidential
Hadoop Developer
Responsibilities:
- Developed Spark code using Scala for processing of data.
- Migrated legacy application to Big Data application using Sqoop/Hive/Spark/HBase.
- Implemented proof of concepts on Spark and Scala for various source data (XML/JSON) transformations and processed data using Spark SQL.
- Creating Hive tables, loading data and performing analysis using hive queries.
- Implemented Partitioning, Dynamic Partitions and Bucketing in Hive for efficient data access.
- Expert hands-on in data deduplication, data profiling for many production tables.
- Worked with SPARK ecosystem using SCALA/HIVE Queries on different data formats like Text and CSV.
- Implemented Spark best practices like partitions, caching and check pointing for faster data access.
- Involved in HBase schema key designs, and Solr designing for contextual search features for a POC.
- Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
- Developed Scripts and Batch Job to schedule various Hadoop Program.
- Write Hive queries for data analysis to meet the business requirements.
- Exploring with Spark improving performance and optimization of the existing algorithms in Hadoop MapReduce using Spark Context, Spark-SQL, Data Frames, Pair RDD's and Spark YARN.
- Developed solutions to pre-process large sets of structured, with different file formats (Text file, Avro data files, Sequence files, Xml and JSON files, ORC and Parquet).
- Weekly meetings with technical collaborators and active participation in code review sessions with senior and junior developers.
- Experienced with batch processing of data sources using Apache Spark.
- Migrating tables from TEXT format to ORC and data induction and other customized file formats.
- Designing and documenting the project use cases, writing test cases, leading offshore team, and interacting with client.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala.
- Responsible for managing data from multiple sources, Load and transform large sets of structured, semi-structured data.
Environment: Scala, Spark, Spark Streaming, Spark SQL, HDFS, Hive, Pig, Linux, Eclipse, Oozie, Hue, Apache Kafka, Sqoop, Oracle, Shell Scripting, Yarn, Ambari and Hortonworks.