Big Data Lead Developer Resume Dallas, TX - Hire IT People

PROFESSIONAL SUMMARY:

Big Data Analytics Engineer with 8+ years of IT experience which includes 6+ years in Big Data in development, system design, enhancement, maintenance, support, re - engineering, debugging and engineering of mission critical complex applications.
Experience on Apache Hadoop technologies like HDFS, MapReduce, YARN, Pig, Hive, Impala, HCatalog, Sqoop, Spark, Kafka, StreamSets, Storm, Spark SQL, Spark streaming, Hadoop streaming.
Experience in loading structured, semi-structured and unstructured data from different sources like Json, csv, xml files, Teradata, MS SQL Server, oracle into Hadoop.
Experience in importing and exporting the different formats of data into HDFS, HBASE from different RDBMS databases and vice versa.
Experience working with Openstack (icehouse, liberty), Ansible, Kafka, ElasticSearch, MySql, Cloudera, MongoDB.
Experienced in working with OpenStack Platform and with all its components such as Compute, Orchestration and Swift.
Expertise working across all phases of SDLC viz requirements gathering, system design, development, enhancement, maintenance, testing, deployment, production support, and documentation.
Good exposure on Yarn environment with Spark, Kafka and dealing with file formats like Avro, Json, Xml and sequence files.
Hands on experience with Spark-Scala programming with good knowledge on Spark Architecture and its in-memory Processing.
Working experience in creating complex data ingestion pipelines, data transformations, data management and data governance in a centralized enterprise data hub.
Experience building Kibana Dashboards with Visualizations on Elastic Search Cluster.
Experience working with streaming data to read the data from Kafka and process it.
Extensively worked on Kafka streaming libraries to consume the data from Web servers.
Hands on experience with Amazon Web Services (AWS) cloud services like EC2, S3, EBS, RDS and VPC.
Experience in supporting data analysis projects using Elastic Map Reduce on the Amazon Web Services (AWS) cloud. Exporting and importing data into S3.
Experience working with Azure Databricks and Azure cloud platform.
Knowledge on Azure Data Factory, Logic App, Blob.
Experience in storing, processing unstructured data using NOSQL databases like Hbase and MongoDB.
Experience in writing work flows and scheduling jobs using Oozie.
Ability to lead Team and develop a project from scratch.
Experience working with Apache NIFI.

TECHNICAL SKILLS:

Big Data Technologies: Apache Spark, HDFS, Map Reduce, Hive, Pig, Oozie, Flume, ZooKeeper, Sqoop, Hbase, Cassandra, Spark Streaming, Zeppelin, Stream Sets, Kafka.

Database Skills: SQL, PL/SQL, MySQL, HBASE, Hive, Impala.

Open Stack: ElasticSearch, Logstash, Ansible, Rhel7, Influxdb, sensu, rabbitmq, Uchiwa, kibana.

Cloud Ecosystem: Amazon Web services (EC2, EMR and S3), Cisco Cloud Services.

Languages: Scala, Python, Java, SQL, Shell Scripting.

Build Tools: Maven, SBT, Chef, Jenkins and Gradle.

Operating Systems: UNIX, Windows, Linux

Other Tools: Tableau, Hue, RDP, Putty, WinSCP.

PROFESSIONAL EXPERIENCE:

Confidential, Dallas, TX

Big Data Lead Developer

Responsibilities:

Designing and Developing Big Data Analytical Solutions to business and AI team.
Working with Data Ingestion, it loads data from Mobile users and POS transaction
Developed End to End pipeline to load the data to salesforce marketing cloud for campaigns.
Build automated process for data imports using python.
Extensively used Azure services ADLS for storing data and ADF pipelines for resource intensive jobs triggering.
Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform and load data from different sources like Azure SQL, Blob storage.
Translate business problems into comprehensive analytical framework usage of advanced Big Data Hadoop/Spark techniques.
Created Spark batch jobs, which reads daily data from azure Data lake file systems and store in hive data warehouse.
Built hive tables to load the ps data from salesforce.
Built Automated process to ingest data to hive tables that gets loaded to salesforce.
Build the end to end flow with ETL pipeline, load data every hour and transform the data.
Import the structured and unstructured data from legacy systems and process the data and load into spark tables.

Confidential, Dallas, TX

Big Data Analytics Engineer

Responsibilities:

Responsible for developing data pipeline using Kafka, Hbase, Spark, ElasticSearch and Hive to extract the data from log messages, other sources and store in HDFS and visualize in Kibana.
Develop Spark-Scala applications to migrate Historical and incremental data load from different RDBMS into Hadoop.
Develop Real Data processing engine to stream the data from Kafka and load it to Elastic Search and Hive.
Worked closely with the Architect; enhanced and optimized product Spark and Scala code to aggregate, group and run data mining tasks using Spark framework.
Developed a generic framework using Spark to stream data from Kafka for processing/parsing JSON data and store in HBase that is re-used by various applications within the enterprise.
Developed several Hive queries to create, partition, merge data from several sources and perform analytics to process the data in to fact tables.
Build Kafka direct stream libraries to consume the Real time data.
Build HBase Cluster to store the streaming data and process it.
Build Elastic Search cluster to store the processed and transformed data in to Index.
Generate Kibana Dashboards & visualizations with ElasticSearch Indexes.
Generate reports real time to improve agent productivity and customer experience.
Deploy the Real time data processing engine in Production and build monitoring of the application.

Confidential, Dallas, TX

Big Data Engineer

Responsibilities:

Involved in complete project life cycle starting from design discussion to production deployment.
Responsible for designing and implementing the data pipeline using Big Data tools including Hive, Kafka, Spark and StreamSets.
Performed transformations, cleaning and filtering on imported data using Hive and Spark.
Developed Spark core and Spark SQL scripts using Scala for faster data processing.
Created automated python scripts to convert the data from different sources and to generate the ETL pipelines.
Configured Streamsets to store the converted data to SQL SERVER using JDBC drivers.
Extensively used big data analytical and processing tools Hive, Spark Core, Spark SQL for batch processing large data sets on Hadoop cluster.
Experienced with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, RDD's and YARN.
Import and export data between the environments like MySQL, HDFS and deploying into productions.
Worked on partitioning and used bucketing in HIVE tables and setting tuning parameters to improve the performance.
Developed Common Data Integrity and Data Sourcing Frameworks for all the risk applications, using Scala and Spark.
Developed the Oozie workflows with Sqoop actions to migrate the data from relational databases like Oracle, Netezza, Teradata to HDFS.
Extensively Worked on Amazon S3 for data storage and retrieval purposes.
Worked with Alteryx a data Analytical tool to develop workflows for the ETL jobs.

Confidential, Raleigh, NC

Big Data/Cloud Engineer

Responsibilities:

Created Streamsets pipelines for collecting Logs, Alerts and metrics from customer Vpod’s.
Created automated python scripts to validate the data flow through elastic search. worked with Influxdb to store the metrics data collected from each customer Vpod.
Configured Streamsets to attach the VpodID to each data flowing through and create topics in kafka.
Setting up the project/tenant with keystone user role.
Creating instances in openstack for setting up the environment.
Setting up the ELK (ElatsticSearch, Logstash, Kibana) Cluster.
Implemented Spark Scripts using Scala, Spark SQL to access hive tables into spark for faster processing of data.
Active member for developing POC on streaming data using Apache Kafka and Spark Streaming.
Trouble shooting any Nova, Glance issue in openstack, Kafka, Rabbitmq bus.
Configured replication with replica set factors, arbiters, voting, priority, server distribution, slave delays in MongoDB.
Installation of MongoDB RPM's, Tar Files and preparing YAML config files.
Performed Data Migration between multiple environments using mongodump and mongo restore commands.
Evaluating the Indexing strategies to support queries and sort documents using index keys.
Experience in converting Hive/SQL queries into Spark transformations using Scala.
Optimizing the cluster overall performance by caching or persisting and unpersisting the data.
Experienced in working with Cloud Computing Services, Networking between different Tenants.
Installed and Worked with Hive, Pig, Sqoop on the Hadoop cluster.
Experience with building kafka cluster setup required for the environment.
Dry Run Ansible Playbook to Provision the OpenStack Cluster and Deploy CDH Parcels.
Experience Using Tools such as Terasort, TestDFSIO and HIBENCH.

Confidential, Logan, UT

Hadoop Developer

Responsibilities:

Handled importing of data from various data sources, performed transformations using Hive, Spark and loaded data into HDFS.
Worked on Data Processing using Hive queries in HDFS and the shell Scripts to wrap the HQL scripts.
Developed and Deployed Oozie Workflows for recurring operations on Clusters.
Created Partitioning, Bucketing, and Mapside Join, Parallel execution for optimizing the hive queries decreased the time of execution
Designed and implemented importing data to HDFS using Sqoop from different RDBMS servers.
Participated in requirement gathering of the project in documenting the business requirements.
Experienced using Ansible scripts to deploy Cloudera CDH 5.4.1 to setup Hadoop Cluster.
Experienced in working with Cloud Computing Services, Networking between different Tenants.
Dry Run Ansible Playbook to Provision the OpenStack Cluster and Deploy CDH Parcels.

Confidential

Software Engineer

Responsibilities:

Responsible for understanding the scope of the project and requirement gathering.
Utilized strong C++ and JAVA Programming and communication skills in a team environment.
Extensively worked on User Interface for few modules using JSPs, JavaScript and Ajax.
Developed framework for data processing using Design patterns, Java, XML.
Used Hibernate ORM framework with spring framework for data persistence and transaction management.
Developed Type3 application, disaster recovery module & console commands.
Debugged and fixed bugs for the MMC project using GDB & stack traces.
Solved memory allocation problems caused by Big Endian and Little-Endian memory conversions.
Contributed to support and maintenance of software applications with GCC & ICC compilers.
Worked with CMS & CCMS configuration management tools which are proprietary tools of ALU and used FLexelint for lint checking.
Involved in design and implementation of web tier using Servlets and JSP.

We provide IT Staff Augmentation Services!

Big Data Lead Developer Resume

Dallas, TX

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship