We provide IT Staff Augmentation Services!

Bigdata Hadoop Developer Resume

2.00/5 (Submit Your Rating)

Irving, TX

SUMMARY

  • Big Data developer with about 7+ years of professionalIT experience that includes about 4 years of Big Data experience in the areas of Health Care,Insurance and Product related fields.
  • Extensive experience in working with various distributions of Hadoop like enterprise versions of Cloudera, Hortonworks.
  • Experienced in developing and deploying big data applications on both Amazon Web Services and Microsoft Azure.
  • In - depth experience in using various Hadoop Ecosystem tools likeHDFS, MapReduce, Yarn, Pig, Sqoop, Spark, Storm, Kafka, Oozie, Elastic search, HBase, and Zookeeper.
  • Extensive noledge ofHadoop architecture and its components.
  • Exposure to Data Lake Implementation using Apache Spark.
  • Experience in designing technical solutions using object oriented design concepts
  • Experience in delivering the application in a highly available, scalable and manageable environment using Google Cloud (GCP) services and private cloud PFC.
  • Developed Data pipelines and applied business logic usingSpark.
  • Well-versed in spark components like Spark SQL, MLib, Spark streaming, and GraphX.
  • Experience in CI/CD pipelines where Jenkins and Terraformed have been used
  • Extensively worked on Spark Streaming and Apache Kafka to fetch live stream data.
  • UsedScala and Python to convert Hive/SQL queries into RDD transformations in Apache Spark.
  • Handled importing data from RDBMS into HDFS using Sqoop and vice-versa.
  • Good Experience in using apache NiFi to automate the data movement between different Hadoop systems
  • Extensive experience in importing and exporting streaming data into HDFS using stream processing platforms like Flume and Kafka.
  • Design and develop ETL integration patterns using Python on spark (PySpark)
  • Deploying VM's, Storage, Network and Resource Group through Azure Portal
  • Implemented MDM and have good noledge over using it.
  • Experience in Pulling and pushing data from different sources into Azure SQL DW using ADF version2.
  • Experience in developing workflows, pipelines, data sets using Azure data Factory.
  • Creating Storage Pool and Stripping of Disk for Azure Virtual Machines. Backup, Configure and Restore Azure Virtual Machine using Azure Backup.
  • Experience in developing a data pipeline using Pig, Sqoop, and Flume to extract the data from weblogs and store inHDFS.
  • Used Impala for using the Parquet file format, which is a columnar storage layout optimized for large-scale queries
  • Hands-on experience in tools like Oozie and Airflow to orchestrate jobs.
  • Good noledge on Azure data factory for Data transfer from On-Premise servers to Azure SQL DB.
  • Proficient in NoSQL databases including HBase, Cassandra, MongoDB and its integration with the Hadoop cluster.
  • Expertise in Cluster management and configuring Cassandra Database.
  • Great familiarity with creating Hive tables, Hive joins & HQL for querying the databases eventually leading to complex Hive UDFs.
  • Accomplished developing Pig Latin Scripts and using Hive Query Language for data analytics.
  • Experience in the practical implementation of cloud-specific AWS technologies including IAM, Amazon Cloud Services likeElastic Compute Cloud (EC2), ElastiCache, Simple Storage Services (S3), Cloud Formation, Virtual Private Cloud (VPC), Route 53, Lambda, EBS, ELB, Kinesis, SNS, Redshift.
  • Worked on data warehousing and ETL tools like Informatica, Talend.
  • Used Snowflake optimizes and stores data in a columnar format within the storage layer, organized into databases as specified by the user
  • Automated data flow between different systems using NiFi
  • Worked on various programming languages using IDEs like Eclipse, NetBeans, and IntelliJ.
  • Used web-based UI development using Django, jQuery UI, CSS, HTML5.
  • Development experience in DBMS like Oracle, MS SQL Server, Teradata, andMYSQL.
  • Experienced in using build tools like Ant, Gradle, SBT, and Maven to build and deploy applications into the server.
  • Experience in complete Software Development Life Cycle (SDLC) in both Waterfall andAgile methodologies.

TECHNICAL SKILLS

Languages/Tools: Java, C++, Scala, VB, XML, HTML/XHTML, HDML, DHTML, Python.

Big Data: HDFS, MapReduce, HIVE, PIG, HBase, SQOOP, Oozie, Zookeeper, Spark, Pyspark, Kafka,MDM, Storm, Cassandra, Solr, Impala.Redshift, Snowflake

Cloud: AWS,GCP (S3, EC2, EMR, Kinesis), Azure (VM, Cosmos DB, SQL DB)

Operating Systems: Windows 95/98/NT/2000/XP, MS-DOS, UNIX, multiple flavors of Linux.

Databases / NO SQL: Oracle 10g, MS SQL Server 2000, DB2, MS Access & MySQL. Teradata, Cassandra, and MongoDB.

GUI Environment: Swing, AWT, Applets.

Messaging & Web Services Technology: SOAP, WSDL, UDDI, XML, SOA, JAX-RPC, IBM WebSphere MQ v5.3, JMS.

Networking Protocols: HTTP, HTTPS, FTP, UDP, TCP/IP, SNMP, SMTP, POP3.

Testing &Case Tools: JUnit, Log4j, Rational Clear case, CVS, Ant, Maven, JBuilder.

Version Control Systems: Git, SVN, CVS

PROFESSIONAL EXPERIENCE

Confidential, Irving, TX

BigData Hadoop Developer

Responsibilities:

  • Responsible for developing and supporting Data warehousing operations.
  • Involved in Peta byte scale data migration operations.
  • Designed and implemented custom NiFi processors that reacted, processed for the data pipeline
  • Worked on building and developing ETL pipelines using Spark-based applications.
  • Designed technical solutions using object oriented design concepts
  • Maintained resources on-premises as well as on the cloud.
  • Started using apache NiFi to copy the data from local file system to HDFS.
  • Implemented Test driven development as per the architecture.
  • Developed Java routines in ETL jobs for data transformation as required
  • Developed SQL queries to generate the look up files needed for the ETL job
  • Design and develop ETL integration patterns using Python on spark (PySpark) used Informatica for extracting required data form operation all systems and transforms the same data on its server and load it to the data warehouse.
  • Used Talend application for integration solutions
  • Designed VNets and subscriptions to confirm to Azure Network Limits.
  • Used Redshift to run queries against exabytes of data in Amazon S3.
  • Experience in developing Spark applications using Spark-SQL inDatabricksfor data extraction, transformation, and aggregation from multiple file formats for Analyzing& transforming the data to uncover insights into the customer usage patterns.
  • Experience in cloud services using Amazon Web Services (AWS) and Google Cloud Platform (GCP).
  • Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL . Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in InAzure Databricks.
  • Exposed Virtual machines and cloud services in the VNets to the Internet using Azure External Load Balance
  • Used Impalawhich supports various file formats such as, LZO, Sequence File, Avro, RCFile, and Parquet.
  • Used Impalaprovides faster access for thedatain HDFS when compared to other SQL engines.
  • Performed Data Extraction, aggregations and consolidation of data within AWS Glue using PySpark
  • Design and Develop ETL Processes in AWS Glue to migrate Campaign data from external sources like S3, ORC/Parquet/Text Files into AWS Redshift
  • Tested the functionality of the ETL jobs by validating the data between source and target systems
  • Developed the Pysprk code for AWS Glue jobs and for EMR.
  • Used HDFS(EMR), S3 as the source to process the batch files for daily and weekly jobs
  • Worked on developing ETL pipelines on S3 parquet files on data lake using AWS Glue
  • Utilized various cloud-based services to maintain and monitor various cluster resources.
  • Experience and expertise inGCP environmentin particularGoogle Big Query, Google Pub/sub, Google Spanner, Dataflow, Compute Engine, Google Storage.
  • Conducted ETL Data Integration, Cleansing, and Transformations using Apache Kudu and Spark.
  • Used Apache Nifi for file conversions and data processing.
  • Developed applications to map the data between different sources and destinations usingPython and Scala.
  • Hands on Knowledge about Snowflake Database to load semi and Structured Data from csv and json file and used ANSI Standarad SQL to Query data in Snowflake Database.
  • Used Snowflake which organizes the data into multiple micro partitions that are internally optimized and compressed.
  • Reviewed and conducted performance tuning on various Spark applications.
  • Responsible for managing data from disparate sources.
  • Used Terraform in managing resource scheduling, disposable environments and multitier applications.
  • Experienced in loading and transforming large sets of structured semi-structured and unstructured data.
  • Using Hive Script in Spark for data cleaning and transformation purpose.
  • Responsible for migrating data from various conventional data sources as per the architecture.
  • Used Autosys to Schedule Spark and Kafka Producer Jobs to run in parallel
  • Developed Spark applications in Scala and Python to migrate the data.
  • Developed Linux based shell scripts to automate the applications.
  • Provided support for building Kafka consumer applications.
  • Performed unit testing and collaborated with the QA team for possible bug fixes.
  • Collaborated with data modelers and other developers during the implementation.
  • Worked in an Agile-based Scrum Methodology.
  • Load data into Hive partitioned tab.
  • Export the analyzed data to relational databases using Kudu for visualization and to generate reports for the Business Intelligence team.

Environment: AWS, Linux, Spark-SQL, Python, Scala, CDH 5.12.1, Kudu, Spark, Oozie, Cloudera Manager, MDM, Hue, SQL Server, Maven, Git, Agile methodology Pyspark. Redshift,GCP, Snowflake, Informatica

Confidential, Atlanta, GA

Spark Developer

Responsibilities:

  • Involved in analyzing business requirements and prepared detailed specifications that follow project guidelines required for project development.
  • Used Sqoop to import data from Relational Databases like MySQL, Oracle.
  • Involved in importing structured and unstructured data into HDFS.
  • Responsible for fetching real-time data using Kafka and processing using Spark and Scala.
  • Worked on Kafka to import real-time weblogs and ingested the data to Spark Streaming.
  • Developed business logic using Kafka Direct Stream in Spark Streaming and implemented business transformations.
  • Design and develop ETL integration patterns using Python on spark (PySpark)
  • Upgraded Spark 1.6 to latest Version Spark 2.2 and configure Kafka Version 0.10. Managing Kafka Offsets, Saving Offsets in external data base like HBase and to its own Kafka.
  • Worked on Building and implementing real-time streaming ETL pipeline using Kafka Streams API.
  • Installed and configured Confluent Kafka in R&D line. Validated the installation with HDFS connector and Hive connectors.
  • Developed ETL jobs in Talend to load various kinds of data formats
  • Used Informatica immensely functional in ETL & Data integration.
  • Used Talend to provide an easy to use graphical interface that allows us to develop, build, test and publish web services, data services
  • Responsible to manage data coming from different sources through Kafka.
  • Worked on Hive to implement Web Interfacing and stored the data inHive tables.
  • Migrated Map Reduce programs into Spark transformations using Spark and Scala.
  • Experienced with Spark Context, Spark-SQL, Spark YARN.
  • Configured Autantication and security in Apachekafkapub-sub system.
  • Implemented Spark Scripts using Scala, Spark SQL to access hive tables into Spark for faster processing of data.
  • Implemented data quality checks using Spark Streaming and arranged passable and bad flags on the data.
  • Implemented Hive Partitioning and Bucketing on the collected data in HDFS.
  • Implemented Sqoop jobs for large data exchanges between RDBMS and Hive clusters.
  • Developed traits andcase classes etc. in Scala.
  • Developed Spark scripts using Scala shell commands as per the business requirement.
  • Worked on Cloudera distribution and deployed on AWS EC2 Instances.
  • Worked on connecting the Cassandra database to the Amazon EMR File System for storing the database in S3.
  • Implemented usage of Amazon EMR for processing Big Data across a Hadoop Cluster of virtual servers on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3).
  • Deployed the project on Amazon EMR with S3 connectivity for setting backup storage.
  • Well versed in using Elastic Load Balancer for Auto scaling in EC2 servers.
  • Configured workflows that involve Hadoop actions using Oozie.
  • Used Python for pattern matching in build logs to format warnings and errors.
  • Coordinated with the SCRUM team in delivering agreed user stories on time for every sprint.

Environment: Hadoop YARN, Spark SQL, Spark-Streaming, AWS S3, AWS EMR, Spark-SQL, GraphX, Scala, Python, Kafka, Hive, Pig, Sqoop, Cloudera, Oracle 10g, Linux.

Confidential, Chicago, IL

Hadoop/Big Data Analyst

Responsibilities:

  • Responsible for creating Hive tables, loading the structured data resulted from MapReduce jobs into the tables and writing hive queries to further analyze the logs to identify issues and behavioral patterns.
  • Chipped away at outlining and building up the Real Time Analysis module for Analytic Dashboard utilizing Cassandra, Kafka, and Spark Streaming.
  • Involved in running MapReduce jobs for processing millions of records.
  • Built reusable Hive UDF libraries for business requirements which enabled users to use these UDF's in Hive Querying.
  • Responsible for Data Modeling in Cassandra as per our requirement.
  • Managing and scheduling Jobs on a Hadoop cluster using Oozie and cron jobs.
  • Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
  • UsedElasticSearch& MongoDB for storing and querying the offers and non-offers data.
  • Created UDFs to calculate the pending payment for the given Residential or Small Business customer, and used in Pig and Hive Scripts.
  • Deployed and built the application using Maven.
  • Used Python scripting for large scale text processing utilities
  • Handled importing of data from various data sources, performed transformations using Hive. (External tables, partitioning).
  • Responsible for data modeling in MongoDB in order to load data which is coming as structured as well as unstructured data.
  • Unstructured files like XML's, JSON files are processed using custom-built Java API and pushed into MongoDB.
  • Wrote test cases in MRunit for unit testing of MapReduce Programs.
  • Involved in templates and screens in HTML andJavaScript.
  • Developed the XMLSchema and Web services for the data maintenance and structures.
  • Built and deployed applications into multiple UNIX based environments and produced both unit and functional test results along with release notes.
  • Experience in Cloud technologies like IBM Bluemix AWS, GCP.

Environment: HDFS, MapReduce, Hive, Pig, Cloudera, Impala, Oozie, Greenplum, MongoDB, Cassandra, Kafka, Storm, Maven, Python, Cloud Manager, Ambari, JDK, J2EE, Struts, JSP, Servlets, Elastic Search, WebSphere, HTML, XML, JavaScript, MRunit.

Confidential, Dallas, TX

Hadoop/Big Data Analyst

Responsibilities:

  • Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive. HBase andMapReduce
  • Extracted data of everyday transaction of customers from DB2 and export to Hive and setup Online analytical processing
  • Installed and configured Hadoop, MapReduce, and HDFS clusters
  • Created Hive tables, loaded the data and Performed data manipulations using Hive queries in MapReduce Execution Mode.
  • Developed MapReduce programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
  • Loaded the structured data which was resulted from MapReduce jobs into Hive tables.
  • Identified issues on behavioral patterns and analyzed the logs using Hive queries.
  • Analyze and transform stored data by writing MapReduce or Pig jobs based on business requirements
  • Used Flume to collect, aggregate, and store the weblog data from different sources like web servers, mobile, and network devices and import to HDFS
  • Using Oozie, developed a workflow to automate the tasks of loading the data into HDFS and pre-processing with Pig scripts
  • Integrated Map-Reduce with HBase to import bulk data using MR programs
  • Used Maven extensively for building jar files of MapReduce programs and deployed to Cluster.
  • Worked in developing Pig Scripts for data capture change and delta record processing between newly arrived data and already existing data in HDFS.
  • Developed data pipeline usingSqoop, Pig and Java MapReduce to ingest behavioral data into HDFS for analysis.
  • Used SQL queries, Stored Procedures, User Defined Functions (UDF), Database Triggers, using tools like SQL Profiler and Database Tuning Advisor (DTA)

Environment: HDFS, Map Reduce, Pig, Hive, Oozie, Sqoop, Flume, HBase, Talend, HiveQL, Java, Maven, Avro, Eclipse and Shell Scripting.

Confidential

Software engineer

Responsibilities:

  • Involved in the analysis, design, implementation, and testing of the project.
  • Developed web services using Python
  • Implemented the presentation layer with HTML, XHTML and JavaScript.
  • Developed web components using JSP, Servlets, andJDBC.
  • Designed tables and indexes.
  • Extensively worked on JUnit for testing the application code of server-client data transferring.
  • Developed and enhanced products in design and in alignment with business objectives.
  • Used SVN as a repository for managing/deploying application code.
  • Involved in the system integration and user acceptance tests successfully.
  • Developed front end using JSTL, JSP, HTML, andJavaScript.
  • Wrote complex SQL queries and stored procedures.
  • Involved in fixing bugs and unit testing with test cases using JUnit.
  • Actively involved in system testing.
  • Involved in implementing the service layer using Spring IOC module.
  • Prepared the Installation, Customer guide, and Configuration document which were delivered to the customer along with the product.

Environment: Python, Java, JSP, JSTL, HTML, JavaScript, Servlets, JDBC, JavaScript, MySQL, JUnit, Eclipse IDE.

We'd love your feedback!