Big Data Lead Developer Resume
Dallas, TX
PROFESSIONAL SUMMARY:
- Big Data Analytics Engineer with 8+ years of IT experience which includes 6+ years in Big Data in development, system design, enhancement, maintenance, support, re - engineering, debugging and engineering of mission critical complex applications.
- Experience on Apache Hadoop technologies like HDFS, MapReduce, YARN, Pig, Hive, Impala, HCatalog, Sqoop, Spark, Kafka, StreamSets, Storm, Spark SQL, Spark streaming, Hadoop streaming.
- Experience in loading structured, semi-structured and unstructured data from different sources like Json, csv, xml files, Teradata, MS SQL Server, oracle into Hadoop.
- Experience in importing and exporting the different formats of data into HDFS, HBASE from different RDBMS databases and vice versa.
- Experience working with Openstack (icehouse, liberty), Ansible, Kafka, ElasticSearch, MySql, Cloudera, MongoDB.
- Experienced in working with OpenStack Platform and with all its components such as Compute, Orchestration and Swift.
- Expertise working across all phases of SDLC viz requirements gathering, system design, development, enhancement, maintenance, testing, deployment, production support, and documentation.
- Good exposure on Yarn environment with Spark, Kafka and dealing with file formats like Avro, Json, Xml and sequence files.
- Hands on experience with Spark-Scala programming with good knowledge on Spark Architecture and its in-memory Processing.
- Working experience in creating complex data ingestion pipelines, data transformations, data management and data governance in a centralized enterprise data hub.
- Experience building Kibana Dashboards with Visualizations on Elastic Search Cluster.
- Experience working with streaming data to read the data from Kafka and process it.
- Extensively worked on Kafka streaming libraries to consume the data from Web servers.
- Hands on experience with Amazon Web Services (AWS) cloud services like EC2, S3, EBS, RDS and VPC.
- Experience in supporting data analysis projects using Elastic Map Reduce on the Amazon Web Services (AWS) cloud. Exporting and importing data into S3.
- Experience working with Azure Databricks and Azure cloud platform.
- Knowledge on Azure Data Factory, Logic App, Blob.
- Experience in storing, processing unstructured data using NOSQL databases like Hbase and MongoDB.
- Experience in writing work flows and scheduling jobs using Oozie.
- Ability to lead Team and develop a project from scratch.
- Experience working with Apache NIFI.
TECHNICAL SKILLS:
Big Data Technologies: Apache Spark, HDFS, Map Reduce, Hive, Pig, Oozie, Flume, ZooKeeper, Sqoop, Hbase, Cassandra, Spark Streaming, Zeppelin, Stream Sets, Kafka.
Database Skills: SQL, PL/SQL, MySQL, HBASE, Hive, Impala.
Open Stack: ElasticSearch, Logstash, Ansible, Rhel7, Influxdb, sensu, rabbitmq, Uchiwa, kibana.
Cloud Ecosystem: Amazon Web services (EC2, EMR and S3), Cisco Cloud Services.
Languages: Scala, Python, Java, SQL, Shell Scripting.
Build Tools: Maven, SBT, Chef, Jenkins and Gradle.
Operating Systems: UNIX, Windows, Linux
Other Tools: Tableau, Hue, RDP, Putty, WinSCP.
PROFESSIONAL EXPERIENCE:
Confidential, Dallas, TX
Big Data Lead Developer
Responsibilities:
- Designing and Developing Big Data Analytical Solutions to business and AI team.
- Working with Data Ingestion, it loads data from Mobile users and POS transaction
- Developed End to End pipeline to load the data to salesforce marketing cloud for campaigns.
- Build automated process for data imports using python.
- Extensively used Azure services ADLS for storing data and ADF pipelines for resource intensive jobs triggering.
- Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform and load data from different sources like Azure SQL, Blob storage.
- Translate business problems into comprehensive analytical framework usage of advanced Big Data Hadoop/Spark techniques.
- Created Spark batch jobs, which reads daily data from azure Data lake file systems and store in hive data warehouse.
- Built hive tables to load the ps data from salesforce.
- Built Automated process to ingest data to hive tables that gets loaded to salesforce.
- Build the end to end flow with ETL pipeline, load data every hour and transform the data.
- Import the structured and unstructured data from legacy systems and process the data and load into spark tables.
Confidential, Dallas, TX
Big Data Analytics Engineer
Responsibilities:
- Responsible for developing data pipeline using Kafka, Hbase, Spark, ElasticSearch and Hive to extract the data from log messages, other sources and store in HDFS and visualize in Kibana.
- Develop Spark-Scala applications to migrate Historical and incremental data load from different RDBMS into Hadoop.
- Develop Real Data processing engine to stream the data from Kafka and load it to Elastic Search and Hive.
- Worked closely with the Architect; enhanced and optimized product Spark and Scala code to aggregate, group and run data mining tasks using Spark framework.
- Developed a generic framework using Spark to stream data from Kafka for processing/parsing JSON data and store in HBase that is re-used by various applications within the enterprise.
- Developed several Hive queries to create, partition, merge data from several sources and perform analytics to process the data in to fact tables.
- Build Kafka direct stream libraries to consume the Real time data.
- Build HBase Cluster to store the streaming data and process it.
- Build Elastic Search cluster to store the processed and transformed data in to Index.
- Generate Kibana Dashboards & visualizations with ElasticSearch Indexes.
- Generate reports real time to improve agent productivity and customer experience.
- Deploy the Real time data processing engine in Production and build monitoring of the application.
Confidential, Dallas, TX
Big Data Engineer
Responsibilities:
- Involved in complete project life cycle starting from design discussion to production deployment.
- Responsible for designing and implementing the data pipeline using Big Data tools including Hive, Kafka, Spark and StreamSets.
- Performed transformations, cleaning and filtering on imported data using Hive and Spark.
- Developed Spark core and Spark SQL scripts using Scala for faster data processing.
- Created automated python scripts to convert the data from different sources and to generate the ETL pipelines.
- Configured Streamsets to store the converted data to SQL SERVER using JDBC drivers.
- Extensively used big data analytical and processing tools Hive, Spark Core, Spark SQL for batch processing large data sets on Hadoop cluster.
- Experienced with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, RDD's and YARN.
- Import and export data between the environments like MySQL, HDFS and deploying into productions.
- Worked on partitioning and used bucketing in HIVE tables and setting tuning parameters to improve the performance.
- Developed Common Data Integrity and Data Sourcing Frameworks for all the risk applications, using Scala and Spark.
- Developed the Oozie workflows with Sqoop actions to migrate the data from relational databases like Oracle, Netezza, Teradata to HDFS.
- Extensively Worked on Amazon S3 for data storage and retrieval purposes.
- Worked with Alteryx a data Analytical tool to develop workflows for the ETL jobs.
Confidential, Raleigh, NC
Big Data/Cloud Engineer
Responsibilities:
- Created Streamsets pipelines for collecting Logs, Alerts and metrics from customer Vpod’s.
- Created automated python scripts to validate the data flow through elastic search. worked with Influxdb to store the metrics data collected from each customer Vpod.
- Configured Streamsets to attach the VpodID to each data flowing through and create topics in kafka.
- Setting up the project/tenant with keystone user role.
- Creating instances in openstack for setting up the environment.
- Setting up the ELK (ElatsticSearch, Logstash, Kibana) Cluster.
- Implemented Spark Scripts using Scala, Spark SQL to access hive tables into spark for faster processing of data.
- Active member for developing POC on streaming data using Apache Kafka and Spark Streaming.
- Trouble shooting any Nova, Glance issue in openstack, Kafka, Rabbitmq bus.
- Configured replication with replica set factors, arbiters, voting, priority, server distribution, slave delays in MongoDB.
- Installation of MongoDB RPM's, Tar Files and preparing YAML config files.
- Performed Data Migration between multiple environments using mongodump and mongo restore commands.
- Evaluating the Indexing strategies to support queries and sort documents using index keys.
- Experience in converting Hive/SQL queries into Spark transformations using Scala.
- Optimizing the cluster overall performance by caching or persisting and unpersisting the data.
- Experienced in working with Cloud Computing Services, Networking between different Tenants.
- Installed and Worked with Hive, Pig, Sqoop on the Hadoop cluster.
- Experience with building kafka cluster setup required for the environment.
- Dry Run Ansible Playbook to Provision the OpenStack Cluster and Deploy CDH Parcels.
- Experience Using Tools such as Terasort, TestDFSIO and HIBENCH.
Confidential, Logan, UT
Hadoop Developer
Responsibilities:
- Handled importing of data from various data sources, performed transformations using Hive, Spark and loaded data into HDFS.
- Worked on Data Processing using Hive queries in HDFS and the shell Scripts to wrap the HQL scripts.
- Developed and Deployed Oozie Workflows for recurring operations on Clusters.
- Created Partitioning, Bucketing, and Mapside Join, Parallel execution for optimizing the hive queries decreased the time of execution
- Designed and implemented importing data to HDFS using Sqoop from different RDBMS servers.
- Participated in requirement gathering of the project in documenting the business requirements.
- Experienced using Ansible scripts to deploy Cloudera CDH 5.4.1 to setup Hadoop Cluster.
- Experienced in working with Cloud Computing Services, Networking between different Tenants.
- Dry Run Ansible Playbook to Provision the OpenStack Cluster and Deploy CDH Parcels.
Confidential
Software Engineer
Responsibilities:
- Responsible for understanding the scope of the project and requirement gathering.
- Utilized strong C++ and JAVA Programming and communication skills in a team environment.
- Extensively worked on User Interface for few modules using JSPs, JavaScript and Ajax.
- Developed framework for data processing using Design patterns, Java, XML.
- Used Hibernate ORM framework with spring framework for data persistence and transaction management.
- Developed Type3 application, disaster recovery module & console commands.
- Debugged and fixed bugs for the MMC project using GDB & stack traces.
- Solved memory allocation problems caused by Big Endian and Little-Endian memory conversions.
- Contributed to support and maintenance of software applications with GCC & ICC compilers.
- Worked with CMS & CCMS configuration management tools which are proprietary tools of ALU and used FLexelint for lint checking.
- Involved in design and implementation of web tier using Servlets and JSP.
