We provide IT Staff Augmentation Services!

Big Data Lead Developer Resume

4.00/5 (Submit Your Rating)

Dallas, TX

PROFESSIONAL SUMMARY:

  • Big Data Analytics Engineer with 8+ years of IT experience which includes 6+ years in Big Data in development, system design, enhancement, maintenance, support, re - engineering, debugging and engineering of mission critical complex applications.
  • Experience on Apache Hadoop technologies like HDFS, MapReduce, YARN, Pig, Hive, Impala, HCatalog, Sqoop, Spark, Kafka, StreamSets, Storm, Spark SQL, Spark streaming, Hadoop streaming.
  • Experience in loading structured, semi-structured and unstructured data from different sources like Json, csv, xml files, Teradata, MS SQL Server, oracle into Hadoop.
  • Experience in importing and exporting the different formats of data into HDFS, HBASE from different RDBMS databases and vice versa.
  • Experience working with Openstack (icehouse, liberty), Ansible, Kafka, ElasticSearch, MySql, Cloudera, MongoDB.
  • Experienced in working with OpenStack Platform and with all its components such as Compute, Orchestration and Swift.
  • Expertise working across all phases of SDLC viz requirements gathering, system design, development, enhancement, maintenance, testing, deployment, production support, and documentation.
  • Good exposure on Yarn environment with Spark, Kafka and dealing with file formats like Avro, Json, Xml and sequence files.
  • Hands on experience with Spark-Scala programming with good knowledge on Spark Architecture and its in-memory Processing.
  • Working experience in creating complex data ingestion pipelines, data transformations, data management and data governance in a centralized enterprise data hub.
  • Experience building Kibana Dashboards with Visualizations on Elastic Search Cluster.
  • Experience working with streaming data to read the data from Kafka and process it.
  • Extensively worked on Kafka streaming libraries to consume the data from Web servers.
  • Hands on experience with Amazon Web Services (AWS) cloud services like EC2, S3, EBS, RDS and VPC.
  • Experience in supporting data analysis projects using Elastic Map Reduce on the Amazon Web Services (AWS) cloud. Exporting and importing data into S3.
  • Experience working with Azure Databricks and Azure cloud platform.
  • Knowledge on Azure Data Factory, Logic App, Blob.
  • Experience in storing, processing unstructured data using NOSQL databases like Hbase and MongoDB.
  • Experience in writing work flows and scheduling jobs using Oozie.
  • Ability to lead Team and develop a project from scratch.
  • Experience working with Apache NIFI.

TECHNICAL SKILLS:

Big Data Technologies: Apache Spark, HDFS, Map Reduce, Hive, Pig, Oozie, Flume, ZooKeeper, Sqoop, Hbase, Cassandra, Spark Streaming, Zeppelin, Stream Sets, Kafka.

Database Skills: SQL, PL/SQL, MySQL, HBASE, Hive, Impala.

Open Stack: ElasticSearch, Logstash, Ansible, Rhel7, Influxdb, sensu, rabbitmq, Uchiwa, kibana.

Cloud Ecosystem: Amazon Web services (EC2, EMR and S3), Cisco Cloud Services.

Languages: Scala, Python, Java, SQL, Shell Scripting.

Build Tools: Maven, SBT, Chef, Jenkins and Gradle.

Operating Systems: UNIX, Windows, Linux

Other Tools: Tableau, Hue, RDP, Putty, WinSCP.

PROFESSIONAL EXPERIENCE:

Confidential, Dallas, TX

Big Data Lead Developer

Responsibilities:

  • Designing and Developing Big Data Analytical Solutions to business and AI team.
  • Working with Data Ingestion, it loads data from Mobile users and POS transaction
  • Developed End to End pipeline to load the data to salesforce marketing cloud for campaigns.
  • Build automated process for data imports using python.
  • Extensively used Azure services ADLS for storing data and ADF pipelines for resource intensive jobs triggering.
  • Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform and load data from different sources like Azure SQL, Blob storage.
  • Translate business problems into comprehensive analytical framework usage of advanced Big Data Hadoop/Spark techniques.
  • Created Spark batch jobs, which reads daily data from azure Data lake file systems and store in hive data warehouse.
  • Built hive tables to load the ps data from salesforce.
  • Built Automated process to ingest data to hive tables that gets loaded to salesforce.
  • Build the end to end flow with ETL pipeline, load data every hour and transform the data.
  • Import the structured and unstructured data from legacy systems and process the data and load into spark tables.

Confidential, Dallas, TX

Big Data Analytics Engineer

Responsibilities:

  • Responsible for developing data pipeline using Kafka, Hbase, Spark, ElasticSearch and Hive to extract the data from log messages, other sources and store in HDFS and visualize in Kibana.
  • Develop Spark-Scala applications to migrate Historical and incremental data load from different RDBMS into Hadoop.
  • Develop Real Data processing engine to stream the data from Kafka and load it to Elastic Search and Hive.
  • Worked closely with the Architect; enhanced and optimized product Spark and Scala code to aggregate, group and run data mining tasks using Spark framework.
  • Developed a generic framework using Spark to stream data from Kafka for processing/parsing JSON data and store in HBase that is re-used by various applications within the enterprise.
  • Developed several Hive queries to create, partition, merge data from several sources and perform analytics to process the data in to fact tables.
  • Build Kafka direct stream libraries to consume the Real time data.
  • Build HBase Cluster to store the streaming data and process it.
  • Build Elastic Search cluster to store the processed and transformed data in to Index.
  • Generate Kibana Dashboards & visualizations with ElasticSearch Indexes.
  • Generate reports real time to improve agent productivity and customer experience.
  • Deploy the Real time data processing engine in Production and build monitoring of the application.

Confidential, Dallas, TX

Big Data Engineer

Responsibilities:

  • Involved in complete project life cycle starting from design discussion to production deployment.
  • Responsible for designing and implementing the data pipeline using Big Data tools including Hive, Kafka, Spark and StreamSets.
  • Performed transformations, cleaning and filtering on imported data using Hive and Spark.
  • Developed Spark core and Spark SQL scripts using Scala for faster data processing.
  • Created automated python scripts to convert the data from different sources and to generate the ETL pipelines.
  • Configured Streamsets to store the converted data to SQL SERVER using JDBC drivers.
  • Extensively used big data analytical and processing tools Hive, Spark Core, Spark SQL for batch processing large data sets on Hadoop cluster.
  • Experienced with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, RDD's and YARN.
  • Import and export data between the environments like MySQL, HDFS and deploying into productions.
  • Worked on partitioning and used bucketing in HIVE tables and setting tuning parameters to improve the performance.
  • Developed Common Data Integrity and Data Sourcing Frameworks for all the risk applications, using Scala and Spark.
  • Developed the Oozie workflows with Sqoop actions to migrate the data from relational databases like Oracle, Netezza, Teradata to HDFS.
  • Extensively Worked on Amazon S3 for data storage and retrieval purposes.
  • Worked with Alteryx a data Analytical tool to develop workflows for the ETL jobs.

Confidential, Raleigh, NC

Big Data/Cloud Engineer

Responsibilities:

  • Created Streamsets pipelines for collecting Logs, Alerts and metrics from customer Vpod’s.
  • Created automated python scripts to validate the data flow through elastic search. worked with Influxdb to store the metrics data collected from each customer Vpod.
  • Configured Streamsets to attach the VpodID to each data flowing through and create topics in kafka.
  • Setting up the project/tenant with keystone user role.
  • Creating instances in openstack for setting up the environment.
  • Setting up the ELK (ElatsticSearch, Logstash, Kibana) Cluster.
  • Implemented Spark Scripts using Scala, Spark SQL to access hive tables into spark for faster processing of data.
  • Active member for developing POC on streaming data using Apache Kafka and Spark Streaming.
  • Trouble shooting any Nova, Glance issue in openstack, Kafka, Rabbitmq bus.
  • Configured replication with replica set factors, arbiters, voting, priority, server distribution, slave delays in MongoDB.
  • Installation of MongoDB RPM's, Tar Files and preparing YAML config files.
  • Performed Data Migration between multiple environments using mongodump and mongo restore commands.
  • Evaluating the Indexing strategies to support queries and sort documents using index keys.
  • Experience in converting Hive/SQL queries into Spark transformations using Scala.
  • Optimizing the cluster overall performance by caching or persisting and unpersisting the data.
  • Experienced in working with Cloud Computing Services, Networking between different Tenants.
  • Installed and Worked with Hive, Pig, Sqoop on the Hadoop cluster.
  • Experience with building kafka cluster setup required for the environment.
  • Dry Run Ansible Playbook to Provision the OpenStack Cluster and Deploy CDH Parcels.
  • Experience Using Tools such as Terasort, TestDFSIO and HIBENCH.

Confidential, Logan, UT

Hadoop Developer

Responsibilities:

  • Handled importing of data from various data sources, performed transformations using Hive, Spark and loaded data into HDFS.
  • Worked on Data Processing using Hive queries in HDFS and the shell Scripts to wrap the HQL scripts.
  • Developed and Deployed Oozie Workflows for recurring operations on Clusters.
  • Created Partitioning, Bucketing, and Mapside Join, Parallel execution for optimizing the hive queries decreased the time of execution
  • Designed and implemented importing data to HDFS using Sqoop from different RDBMS servers.
  • Participated in requirement gathering of the project in documenting the business requirements.
  • Experienced using Ansible scripts to deploy Cloudera CDH 5.4.1 to setup Hadoop Cluster.
  • Experienced in working with Cloud Computing Services, Networking between different Tenants.
  • Dry Run Ansible Playbook to Provision the OpenStack Cluster and Deploy CDH Parcels.

Confidential

Software Engineer

Responsibilities:

  • Responsible for understanding the scope of the project and requirement gathering.
  • Utilized strong C++ and JAVA Programming and communication skills in a team environment.
  • Extensively worked on User Interface for few modules using JSPs, JavaScript and Ajax.
  • Developed framework for data processing using Design patterns, Java, XML.
  • Used Hibernate ORM framework with spring framework for data persistence and transaction management.
  • Developed Type3 application, disaster recovery module & console commands.
  • Debugged and fixed bugs for the MMC project using GDB & stack traces.
  • Solved memory allocation problems caused by Big Endian and Little-Endian memory conversions.
  • Contributed to support and maintenance of software applications with GCC & ICC compilers.
  • Worked with CMS & CCMS configuration management tools which are proprietary tools of ALU and used FLexelint for lint checking.
  • Involved in design and implementation of web tier using Servlets and JSP.

We'd love your feedback!