Sr Software Engineer Resume San Jose, CA - Hire IT People

SUMMARY

7+ years of programming and software development experience with skills in design, development, and deployment of software systems from the development stage to the production stage in Machine Learning and Big Data technologies.
Experience in Big Data and Hadoop Ecosystem Tools like Map Reduce, Spark, Flink, Beam, Hive, Sqoop, Oozie, Storm, Nifi and Kafka.
Work experience in building software solutions that ideally include recommender systems, personalization, forecasting, conversational systems, outlier detection, and hypothesis testing
Expert analytical and problem - solving skills; proficient in data-driven clustering, classification, ranking, and estimation techniques
Experience with machine learning frameworks such as MLlib, TensorFlow, Caffe, Torch, or Theano
Develop, optimize, standardize and implement data science and machine learning solutions at scale in data pipelines and distributed systems (e.g., Hadoop/Spark ecosystem)
Optimize data science and machine learning models using high performance computing (e.g., GPGPU) and real-time technique
Experience or strong interest in foundational machine learning models and concepts: regression, random forest, boosting, HMM, CRFs, MRFs, deep learning
Experience in creating Hive Queries and UDFs using Java for analysis of data efficiently.
Expert in using Sqoop for fetching data from different systems and HDFS to analyze in HDFS, and again putting it back to the previous system for further processing.
Good experience in optimizing MapReduce algorithms using Mappers, Reducers, Combiners and Petitioner to deliver the best results for the large dataset.
Provide in-depth expertise on evolving Kafka capabilities.
Identifying data points, data domains, and performing data modeling and transforming them to Hadoop/Data Lake.
Developed kafka integration, including topics, producers, consumers and streaming (KStream and KTable) applications.
Designed and developed custom kafka connectors for to and fro movement of data between kafka topics and Redis/Kudu.
Built Realtime stream aggregation framework using KSql. Leveraged KSql materialized views for fast access of data KTable.
Setup kafka connect cluster for moving the data to kafka cluster from various source and sink like Postgres, Kudu, Redis, Hdfs, Hive, Elastic.
Developed tool to read the offsets from the kafka native topics to understand the latency in KSql applications.
Developed Kafka Stream applications using low level Api for Stream -- Stream joins.
Contributor kafka-cp ansible GitHub repo for automating the auto deployment of various kafka services.
Expert in Amazon EMR, Spark, Kinesis, S3, Boto3, Bean Stalk, ECS, Cloud watch, Lambda, ELB, VPC, Elastic Cache, Dynamo DB, Redshift, RDS, Athena, Zeppelin & Airflow.
Experience using Kafka and Flume for streaming data transfers from different data sources into HDFS and HBase.
Strong knowledge of Software Development Life Cycle and expertise in detailed design documentation.
Knowledge and experience of architecture and functionality of NoSQL DB like Cassandra and MongoDB.
Experience in working with NIFI to pull data from different API's and Sources.
Extensive expensive in writing Spark streaming applications to analyze real-time clickstream data.
Extensive experience in writing Spark core applications to perform aggregations on the data.
Experience in writing Storm topologies to analyze the real-time data fed by NIFI.

TECHNICAL SKILLS

Big Data Technologies: Hadoop 2.x, HDFS, MapReduce, Storm, Spark, Nifi, Pig, Hive, Sqoop, Kafka, Oozie, Avro, Impala, Tez, YARN and Zookeeper.

Packages: Scikit-Learn, NumPy, SciPy, Pandas, NLTK, TensorFlow, PyTorch, Keras and Plotly

Programming Languages: Java, Scala, C, R, Go, Python, Octave and MATLAB.

NoSQL: HBase, DynamoDB, Bigtable, MongoDB, Cassandra and CosmosDB

Scripting/Web Technologies: JavaScript, HTML, XML, Shell Scripting, Python, Angular, Nodejs, Ajax and php

Databases: Kudu, Pinot, Clickhouse, Oracle, MySQL, PostgreSQL, SQL Server, and Teradata

Operating Systems: Linux, UNIX, and Windows.

Visualization Tools: Tableau, Qlik and QuickSight

PROFESSIONAL EXPERIENCE

Confidential, San Jose, CA

Sr Software Engineer

Responsibilities:

Built Real time APIs on time series Database.
Developed Spark-Streaming applications to consume the data from Kafka topics and to insert the processed streams to HBase.
Developed API's to expose the meetings data to customers.
Worked closely with other business analysts, development teams and infrastructure specialists to deliver high availability solutions for mission critical applications.
Orchestrated efficient large-scale software deployments, including testing features and correcting code.
Designed compliance frameworks for multi-site data warehousing efforts to verify conformity with state and federal data security guidelines.

Confidential, San Jose, CA

Sr Software Engineer

Responsibilities:

Real time data analytics on Confidential WebEx platform.
Ingesting the real time call and meeting analytics into Kafka and perform analysis on the data.
Build and deployed a Forecasting model to predict the WebEx call quality.
The model is a Multi Layered Deep belief network that reads a real time stream of Users call quality metrics and predict the quality of call.
Streamlines the feature selection for the model to predict the quality of the call.
Contributed meaningful improvements to existing machine learning models through carefully directed research.
Derived actionable insights from massive data sets with minimal support.
Tree based BRNN language model was built to analyze the attitude from customer review to understand the sentiment.
Created Nifi templates which pull real-time data from various API's which is further aggregated using the Spark application.
Realtime Stream aggregations are performed using kafka streams and ksqldb.

Confidential, NYC, NY

Big Data Engineer

Responsibilities:

Identifying data points, data domains, analyzing complex ETL's and performing data modeling and transforming them to Hadoop/Data Lake.
Involved in Performing the proof of concept of the existing On-Premises data warehouse and Hadoop to Google cloud and Amazon cloud.
Developed a In House framework for converting the inhouse Sql and Store Procedure from native SQL to Hive Sql and Big Query Sql's.
Improving the performance of our flagship retargeting product by improving our prediction of ad engagement and user transactional behavior
Building new models to extract more value from the customer data we collect, which will help us know more about users and merchants than anyone else in marketing or retail
Develop high precision classifiers and tools leveraging machine learning, regression and rule-based based models
Work in technical teams in development, deployment, and application of applied analytics, predictive analytics, and prescriptive analytics
Built a Framework for retirement of IBM Netezza to hive which bulk imports the data and converts the existing Sql's.
Experience using Kafka and Flume for streaming data transfers from different data sources into HDFS and HBase
Designed and Architected Spark programs using Scala to compare the performance of Spark with Hive and SQL and developed Scala scripts using both RDD and Data frames/SQL/Datasets in Spark 1.6 for Data Aggregation, queries and writing data.
Created Kafka data pipelines to ingest credit data to Data Lake.
Develop and exercise automated infrastructure testing to ensure Kafka configuration changes or upgrades are not detrimental to Kafka-integrating applications.
Employed Spark API over Hortonworks Hadoop YARN to perform analytics on data in Hive and optimized the existing algorithms in Hadoop using Spark Context, Spark -SQL, Data Frames and RDD's.
Developed Map Reduce Programs for different types of input formats like JSON, XML and CSV formats.
Developed Nifi templates for API ingestion.
Have hands-on experience working on Sequence files, AVRO, HAR file formats and compression.
Developed multi-tier application on AWS.
Developed solutions on AWS glue and data pipeline.
Used Visualization tools such as Power view for excel, Tableau for visualizing and generating reports.
Performed near real-time and batch Syslog/files ingestion into HDFS using Flume, and Kafka.
Performed data filtration and transformation using Pig and created Hive schemas for the structured data.
Used RDD's to perform the transformation on datasets as well as to perform actions like count, reduce, first.
Implemented various checkpoints on RDD's to disk to handle job failures and debugging.
Develop integration solution to bridge the gap between external and HDFS and Reduced the model run times by performance tuning of the models for the business to run the model hundreds of times a day.

Confidential, Bellevue, WA

Big Data Developer

Responsibilities:

For Confidential, data is everything. Confidential 's previous traditional EDW system was slow, highly modeled, and expensive and did not meet the needs of business users.
To overcome the problem Confidential created a data lake and could quickly glean new customer insights that previously could not be seen from small sets of data.
Developed Map Reduce jobs for Log Analysis, Recommendation and Analytics.
Wrote Pig Scripts to generate MapReduce jobs and performed ETL procedures on the data in HDFS.
Processed HDFS data and created external tables using Hive, to analyze visitors per day, page views and most purchased products.
Implement solutions to monitor Kafka components to proactively address any Kafka messaging issues.
Exported analyzed data to HDFS using Sqoop for generating reports.
Used MapReduce and Sqoop to load, aggregate, store and analyze web log data from different web servers.
Developed Hive queries for the analysts.
Created Nifi templates which pull real-time data from various API's which is further aggregated using the Spark application.
Dashboards are created on the aggregated data.
Experience in using Spark SQL to perform aggregations on the data.
Experience in the optimization of the Map-reduce algorithm using combiners and partitions to deliver the best results and worked on Application performance optimization for an HDFS cluster.
Enforce security standards as part of overall Kafka implementation.
Designed and deployed full SDLC of AWS Hadoop cluster based on the client's business need.
Experience on BI reporting with at Scale OLAP for Big Data.
Designed and Developed Real-time Stream Processing Application using Spark, Kafka, Scala and Hive to perform Streaming ETL and apply Machine Learning.
Used Hive to analyze data ingested into HBase by using Hive-HBase integration and compute various metrics for reporting on the dashboard.

Confidential, Houston, TX

Hadoop Developer

Responsibilities:

Responsible for building scalable distributed data solutions using Hadoop.
Written multiple Map Reduce programs for Data Analysis.
Performed performance tuning and troubleshooting of Map Reduce jobs by analyzing and reviewing Hadoop log files.
Developed Pig scripts for analyzing large data sets in the HDFS.
Collected the logs from the physical machines and the OpenStack controller and integrated into HDFS using Flume.
Knowledge of handling Hive queries using Spark SQL that integrates with Spark environment.
Responsible for creating Hive tables, loading the structured data resulted from MapReduce jobs into the tables and writing Hive queries to further analyze the logs to identify issues and behavioral patterns.
Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for Hive performance enhancement and storage improvement.
Implemented Daily jobs that automate parallel tasks of loading the data into HDFS using Oozie coordinator jobs.
Responsible for performing extensive data validation using Hive
Sqoop jobs, Pig and Hive scripts were created for data ingestion from relational databases to compare with historical data.
Used Pig as ETL tool to do transformations, event joins, filter, and some pre-aggregation
Implemented business logic by writing Pig UDFs in Java and used various UDFs from Piggybanks and other sources Implemented Hive Generic UDF's to implement business logic.
Implemented test scripts to support the test-driven development and continuous integration.
Involved in story-driven agile development methodology and actively participated in daily scrum meetings
Worked on tuning the performance of Pig queries.

Confidential

Software Developer

Responsibilities:

OpenText software applications manage content or unstructured data for most types of governance, efficiency and monetization requirements in large companies, government agencies and professional service firms.
OpenText solutions are aimed at addressing information management requirements, including the management of large volumes of content, compliance with regulatory requirements, and mobile and online experience management.
As a Software Developer implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies such as Hadoop, MapReduce Frameworks, Hive, Sqoop etc.
Analyzed large amounts of data sets to determine the optimal way to aggregate and report on it.
Developed and written Apache Pig scripts and Hive scripts to process the HDFS data.
Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
Involved in creating Hive tables, loading with data, and writing hive queries that will run internally in the map-reduce way.
Experience in the optimization of the Map-reduce algorithm using combiners and partitions to deliver the best results and worked on Application performance optimization for an HDFS cluster.
Implemented business logic by writing UDFs in Java and used various UDFs from Piggybanks and other sources.
Created MR jobs using Java for data cleaning and preprocessing Effectively use Map Reduce algorithms to analyze data and generate sales reports.
Performed thorough data analysis for the purpose of overhauling the database using SQL Server.
Designed and implemented business intelligence to support sales and operations functions to increase customer satisfaction.
Converted physical database models from logical models, to build/generate DDL scripts.
Extensively used ETL to load data from DB2, Oracle databases.

We provide IT Staff Augmentation Services!

Sr Software Engineer Resume

San Jose, CA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship