We provide IT Staff Augmentation Services!

Big Data Engineer (hadoop Developer) Resume

0/5 (Submit Your Rating)

ChicagO

SUMMARY

  • Having 8 plus years of experience in IT experience in software design, development, implementation, and support of business applications for Telecom, health and Insurance industries
  • Worked extensively on installing and configuring Hadoop ecosystem components Hive, SQOOP, PIG, HBase, Zookeeper and Flume
  • Performed a key role in understanding the business requirements for migrating data to data warehouse.
  • Performed unit testing at various levels of the ETL and actively involved in team code reviews.
  • Data Extraction, aggregations, and consolidation of Adobe data within AWS Glue using PySpark.
  • Designed and implemented effective Analytics solutions and models withSnowflake.
  • Compared the performance of the Hadoop based system to the existing processes used for preparing the data for analysis
  • Worked on real time data integration using Kafka,Spark streaming and HBase.
  • Implemented ETL operations using Big Data platform
  • Good Knowledge in Spark and Scala, Sql queries and creating databases like stored procedures
  • Triggers for implementing business techniques.
  • Implementing a CI/CD Pipeline involving Bitbucket, Jenkins, Chef, Docker for complete automation from commit to deployment.
  • Hands of experience on build tools like Maven, Log4j, Junit and Ant
  • Working with the data extraction, transformation and load in Hive, Pig and HBase
  • Hands on Experience in designing and developing applications in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
  • Experience in understanding the security requirements for Hadoop and integrate with Kerberos authentication and authorization infrastructure.
  • Hands on experience on Streaming data ingestion and Processing
  • Experience in designing different time driven and data driven automated workflows using Oozie.
  • Expertise in writing the Real - time processing application Using spout and bolt in Storm.
  • Experience in configuring the Zookeeper to coordinate the servers in clusters and to maintain the data consistency
  • Used Python and Django to interface with the jQuery UI and manage the storage and deletion of content.
  • Acumen on Data Migration from Relational Database to Hadoop Platform using SQOOP.
  • Installed and configuredHadoopeco system components
  • Wrote Flume configuration files for importingStreaminglog Data intoHBasewith Flume.
  • Imported several transactional logs from web servers with Flume to ingest the Data into HDFS.
  • Installed and configured Pig, written Pig Latin scripts to convert the Data from Text file to Avro format.
  • Created Partitioned Hivetables and worked on them using HiveQL.
  • Experienced in migrating ETL transformations using Pig Latin Scripts, transformations, join operations.
  • Good understanding of MPP databases such as HP Vertica and Impala.
  • Hands on experience in configuring and working with Flume to load the data from multiple sources directly into HDFS
  • Experience in using design pattern, Java, JSP, Servlets, JavaScript, HTML, jQuery, Angular JS, Mobile jQuery, JBOSS 4.2.3, XML, Web Logic, SQL, PL/SQL, JUnit, and Apache-Tomcat, Linux.
  • Expertise in relational databases like Oracle, My SQL and SQL Server.
  • Experience in implementing projects both in Agile and Waterfall methodologies.
  • Having Good Experience on Cloud Technologies AWS, Azure and GCP
  • Strong Experience on Data Warehousing ETL concepts using Informatica Power Center, OLAP, OLTP and AutoSys.
  • MigratedHiveQLqueries intoSparkSQLto improve performance.
  • Performed Data integrity, validation and testing on the data migrated into the data warehouse.

TECHNICAL SKILLS

  • Bigdata/Hadoop
  • Oracle12c
  • SQOOP
  • PIG HIVE
  • SQL
  • PL/SQL
  • API
  • HBase
  • NoSQL
  • Python
  • Pyspark
  • ADF
  • SaaS
  • Erwin
  • Kafka
  • Spark
  • SSIS
  • Map/Reduce
  • ETL
  • SSRS
  • Tableau
  • Oozie
  • Teradata

PROFESSIONAL EXPERIENCE

Big Data Engineer (Hadoop Developer)

Confidential

Responsibilities:

  • Expertise in designing and deployment of Hadoop cluster and different Big Data analytic tools including Hive, Autosys, Spark, Impala with Cloudera distribution
  • Implemented Performance tuning to long running jobs and added Parallelism to jobs
  • Created views on top of existing data to transform with HiveQL
  • Reviewed Hadoop cluster configuration, application configuration and configure required parameters
  • Built processing flows with UC4 scheduler and work on data cleansing routines
  • Supports production applications for daily loads and automate them using AutoSys
  • Performed hive query optimization techniques for integration
  • Responsible to run Benchmarking and work on fine tuning the applications
  • Performed Monthly refresh activities and produce data for MBRs on monthly basis
  • Executed SQL statements to execute validation on the application data and update the data layouts
  • Experience with collaborating with other teams like Product, Technology, Risk, Compliance, Legal, and Operations
  • Worked with teams in different geographical region to deliver solutions
  • Managed and Deployed applications as Docker Containers to Aws Cloud Infrastructure
  • Deployed Elastic Load balancer for scalable and high available on AWS
  • Implemented and managed Continuous automation on AWS Cloud with cloud formation Terraform
  • Having Good Knowledge on cloud technologies as AWS, Azure and Google GCP
  • Extract Transform and Load Data from Sources Systems toAzure Data Storage.
  • Responsible to keep up to date on new/updated tools in the Big Data environment
  • Designing and implementing Hive queries and functions for evaluation, filtering, loading and storing of data
  • Implementing a risk assessment process and presenting the global risk program to the Board along with senior management
  • Automated Jobs using Austosys for all Environments with Regressive Testing
  • Worked on Solution requirements which includes Transformations with the data sources given Developed mechanisms for data ingestion and load data from the sources
  • Worked on complex and mission - critical data analysis for a wide range of applications using data in different formats, volumes and source systems at a company scale
  • Helping Production support for the Sqoop jobs Running on Production.
  • Having experience on creating databases, tables and views in HIVEQL and IMPALA
  • Working with data delivery team to setup new Hadoop users, Linux users, setting up Kerberos principals and testing HDFS, Hive, Pig and MapReduce access for the new users on Horton works
  • Good knowledge in using Hibernate for mapping Java classes with database and using Hibernate Query Language (HQL).
  • Involving in UNIT testing and in Integration Testing and Deployment

Hadoop Developer

Confidential, Chicago

Responsibilities:

  • Involved in Migrating applications from Cloudera to Hortonworks. Experience in working with various Cloudera distributions (CDH4/CDH5) and have knowledge on Hortonworks
  • Expertise in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase, Oozie, Sqoop, Flume, Spark, Impala.
  • Responsible for data extraction and data ingestion from different data sources into Hadoop Data Lake by creating ETL pipelines using Sqoop and Hive
  • Professional Java developer with strong expertise in data engineering and big data technologies.
  • Hands on experience in programming using Java, Scala and SQL.
  • Analyzed the data using HiveQL to identify the different correlations and used core Java technologies to create Hive/Pig UDFs to use in the project.
  • Implemented complex MapReduce programs to perform joins on the Map side using Distributed Cache in Java.
  • Experience in performance tuning the Hadoop cluster by gathering and analyzing the existing infrastructure.
  • Continuous monitoring and managing the Hadoop cluster using Cloudera Manager and Ambari
  • Created multiple Map Reduce Jobs using Java API, Pig and Hive for data extraction
  • Wrote ETL jobs to read from web APIs using REST and HTTP calls and loaded into HDFS using java and Talend.
  • Operated on Java/J2EE systems with different databases, which include Oracle, MySQL and DB2.
  • Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
  • Worked on Talend ETL tool and used features like context variable and database components like input to oracle, output to oracle, tFile compare, tFile copy, to oracle close ETL components
  • Created ETL Mapping with Talend Integration Suite to pull data from Source, apply transformations, and load data into target database.
  • Worked with D-Series for Scheduling Sqoop jobs
  • Used HiveQL for data analysis like creating tables and import the structured data to specified tables for reporting.
  • Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
  • Designed and Implemented Azure architectures, environments, and resources
  • Implemented migrations of applications and databases onto Azure
  • Contributed or author blogs, whitepapers, presentations on Azure technical and strategic topics
  • Having experience on creating databases, tables and views in HIVEQL and IMPALA
  • Working with data delivery team to setup new Hadoop users, Linux users, setting up Kerberos principals and testing HDFS, Hive, Pig and MapReduce access for the new users on Horton works
  • Good knowledge in using Hibernate for mapping Java classes with database and using Hibernate Query Language (HQL).
  • Extensive hands on experience in writing complex MapReduce jobs and Hive data modeling.
  • Involved in Agile methodologies, daily scrum meetings, spring planning.

Hadoop Developer

Confidential

Responsibilities:

  • Developed Schedulers that communicated with the Cloud based services (AWS) to retrieve the data.
  • Designed and implementedHIVE queries and functions for evaluation, filtering, loading and storing of data.
  • Creating Hive tables and working on them using HiveQL.
  • Developed data pipeline using Kafka and Storm to store data into HDFS.
  • Continuous monitoring and managing theHadoop cluster through Cloudera Manager.
  • Involved in review of functional and non-functional requirements.
  • Implemented Frameworks using Java and python to automate the ingestion flow.
  • Developed views and templates with Python and Django's view controller and templating language to create a user-friendly website interface.
  • Experience in implementing python alongside using various libraries such as graphs, MySQL db for database connectivity, python-twitter.
  • Developed a fully automated continuous integration system using Git, Gerrit, Jenkins, MySQL and custom tools developed in Python and Bash.
  • Responsible to manage data coming from different sources.
  • Loaded the CDRs from relational DB using Sqoop and other sources toHadoop cluster by using Flume.
  • Experience in processing large volume of data and skills in parallel execution of process using Talend functionality.
  • Involved in loading data from UNIX file system and FTP to HDFS.
  • DevelopedHivequeries to analyze the output data.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
  • Designed Cluster co-ordination services through Zookeeper.Collected the logs data from web servers and integrated in to HDFS using Flume.
  • Used HIVE to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
  • Designed and implemented Spark jobs to support distributed data processing.
  • Supported the existing MapReduce Programs those are running on the cluster.
  • Wrote the shell scripts to monitor the health check ofHadoop daemon services and respond accordingly to any warning or failure conditions.
  • Involved in Migrating the Hive Data to Google BigQuery
  • Implementing the Automatic workflows with apache Airflow and Integrated the scripts with Jenkins
  • Used Jenkins to deploy code to Google Cloud with new namespaces, create Docker images and push them to container registry of Google Cloud.’
  • Expertise in documenting and deployment process and high-level preparation of Release notes, Checklists, Quality process docs, Analysis docs, configuration docs with versions.
  • Lead many formal and informal sessions to educate the issues of security and the importance of best practices in GCP.
  • Expertise in designing the Google Cloud architecture by following the financial regulations from security point of view.
  • Expertise in several GCP service focusing on Security, Kubernetes and Biq Query.
  • Expertise in automation of the infrastructure using Terraform for both AWS and GCP.
  • Created HIVE table to store the processed results in tabular format.
  • Involved in Building multitenant solutions using Python and internal tools, delivering complex cloud platforms
  • Worked with Spark, improving the performance and optimization of the existing applications in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN
  • Monitored Hadoop Jobs and Reviewed Logs of the failed jobs to debug the issues based on the errors
  • Fine tune Hadoop applications for high performance and throughput
  • Worked on POC on the real time streaming the data using Spark with Kafka.
  • Support development with application architecture in both real time and batch big data processing
  • Used Spark API overHadoopYARN as execution engine for data analytics using Hive
  • Worked on different file formats (ORCFILE, TEXTFILE) and different Compression Codecs (GZIP, SNAPPY, LZO).

Java Developer

Confidential

Responsibilities:

  • Involved in Agile - Sprint methodologies of SDLC for project management design, development,
  • Designed and implemented the training and reports modules of the application using Servlets, JSP andajax
  • Involved in Agile - Sprint methodologies of SDLC for project management design, development
  • Experience in using Spring Integration and RabbitMQ for creation of web services and communication.
  • Interact with Business Users and Develop Custom Reports based on the criteria defined.
  • Requirement gathering and information collection.
  • Analysis of gathered information so as to prepare a detail work plan and task breakdown structure
  • Developed custom JSP tags for the application
  • Involved in the phases of SDLC (Software Development Life Cycle) including Requirement collection
  • Design and analysis of Customer specification, Development and Customization of the application
  • Used Postman to trigger HTTP requests making the SOAP and REST based APIs work faster.
  • Created and consumed SOAP and REST services using CXF and used Mule ESB to route various calls to do validation of service input and to handle exceptions.
  • Used Quartz schedulers to run the jobs in a sequential with in the given time
  • Implemented the reports module applications using jasper reports for business intelligence
  • Good Experience in Exposure to Writing SQL/Transact-SQL (DDL, DML and DCL)
  • Developing, Creating New Database and Database Objects Such as Tables, Views, Indexes, Complex Stored
  • Deployed application on tomcat server for business application in client location
  • End-to-End System development and testing of Unit integration and System integration
  • Co-ordination activities with Onshore and Offshore team of 10+ members
  • Responsible for Effort estimation and timely production deliveries
  • Creation and Execution of half yearly and yearly load jobs which updates new rate and discounts etc. for the claim calculations in Database and Files
  • Extensively used Java multi-threading to implement batch Jobs with JDK 1.5 features
  • Configured the project on Web Logic 10.3 application servers
  • Implemented the online application using Core Java, JDBC, JSP, Servlets, spring, Hibernate, Web Services, SOAP, and WSD.

We'd love your feedback!